xy-guo / LIGA-Stereo

Code for LIGA-Stereo Detector, ICCV'21
Apache License 2.0
90 stars 18 forks source link

scripts/dist_train.sh #12

Open JangChangWon opened 2 years ago

JangChangWon commented 2 years ago

Hello, Thanks for your excellent work !

I have several problem about distributed training

When i try to "CUDA_VISIBLE_DEVICE=0 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" and "CUDA_VISIBLE_DEVICE=0 ./scripts/dist_train.sh 1 exp cfg_path", it is worked. but when i try to "python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or "CUDA_VISIBLE_DEVICE=0,1,2,3 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or "CUDA_VISIBLE_DEVICE=0,1,2,3 ./scripts/dist_train.sh 4 exp cfg_path", That are not worked. How can i modify about the code for distributed training?

zjwzcx commented 2 years ago

I guess that you should set NGPUS=5 instead of 4. (CUDA_VISIBLE_DEVICE=0,1,2,3,4 ==> 5 GPUs)

JangChangWon commented 2 years ago

I guess that you should set NGPUS=5 instead of 4. (CUDA_VISIBLE_DEVICE=0,1,2,3,4 ==> 5 GPUs)

I wrote it down wrong. Thank you for letting me know.