Hi,
I have a question to ask.
After I follow the steps to install, start training python tools/train_moco.py --img_size 32 --moco-k 12800 --arch resnet18_cifar --save_folder ./results/cifar10/moco_res18_cls --resume ./results/cifar10/moco_res18_cls/checkpoint_last.pth.tar --data_type cifar10 --data ./datasets/cifar10 --all 0in training tutorial
Below is the error:
(u) C:\Users\Kelly>cd SPICE
(u) C:\Users\Kelly\SPICE>python tools/train_moco.py --img_size 32 --moco-k 12800 --arch resnet18_cifar --save_folder ./results/cifar10/moco_res18_cls --resume ./results/cifar10/moco_res18_cls/checkpoint_last.pth.tar --data_type cifar10 --data ./datasets/cifar10 --all 0
Use GPU: 0 for training
Traceback (most recent call last):
File "tools/train_moco.py", line 453, in
main()
File "tools/train_moco.py", line 145, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "C:\Users\Kelly\SPICE\tools\train_moco.py", line 170, in main_worker
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\distributed\distributed_c10d.py", line 602, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\distributed\distributed_c10d.py", line 727, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
Hi, I have a question to ask. After I follow the steps to install, start training
python tools/train_moco.py --img_size 32 --moco-k 12800 --arch resnet18_cifar --save_folder ./results/cifar10/moco_res18_cls --resume ./results/cifar10/moco_res18_cls/checkpoint_last.pth.tar --data_type cifar10 --data ./datasets/cifar10 --all 0
in training tutorialBelow is the error:
How do I need to solve thanks Kelly