Open SMohammadi89 opened 2 years ago
Hi, can you provide the more details about your issue, like logs, cuda version, number of gpus on your server ...
this is the complete error, cuda version is 10.2 and I have 4 GPUs tesla v100
File "main.py", line 68, in
Hi,
Thanks for your great work. I would like to train the model using multiple GPUs but I receive this error: " RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect."
by running this code: CUDA_VISIBLE_DEVICES=0,1 singularity exec --nv --writable-tmpfs -B /work/myname/ /work/myname/pointr.sif bash ./scripts/dist_train.sh 2 13232 --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
Note that I do not have any problem when using single gpu