Closed SherifGabr closed 1 year ago
The issue is resolved now. It may be caused by line 372 in samurai_model.py
.batch(chunk_size * get_num_gpus())
Where if the number of GPUs is 0, the batch size will be 0 and so produces the error. It seems that $LD_LIBRARY_PATH
was incorrectly set and caused the model to not utilize any GPUs for training.
@SherifGabr it doesn't seem the current version of code is still correct.
@monajalal Did you correctly set the LD_LIBRARY_PATH? IIRC after setting it, it worked with no issues. If that doesn't work, I would check the CUDA drivers. Also, I trained on 1 GPU (Nvidia V100), could it be that you are training on multiple GPUs and it somehow fails?
Encountered this error after running the following command
python train_samurai.py --config configs/samurai/samurai.txt --datadir fire_engine/ --basedir ../fire_engine_train/ --expname exp1 --gpu=0
Tried with other scenes (duck) but the same error persists.