In train.sh the following line limits the CUDA_VISIBLE_DEVICES environment variable with GPUs with versions greater or equal to 6.x.
export CUDA_VISIBLE_DEVICES=$(python3 -c "import torch; x=[str(x) for x in range(torch.cuda.device_count()) if torch.cuda.get_device_capability(x)[0]>=6]; print(','.join(x))" 2>/dev/null)
I don't know why there is this limitation, but this caused an issue in my case where I use a Tesla K80 that is a 3.x version. When I executed the training script, the error would say that I do not have a CUDA available device.
After removing this limitation (export CUDA_VISIBLE_DEVICES=1) I was able to run the training procedure correctly.
This version limitation is really needed? Or can we remove it?
In
train.sh
the following line limits the CUDA_VISIBLE_DEVICES environment variable with GPUs with versions greater or equal to 6.x.I don't know why there is this limitation, but this caused an issue in my case where I use a Tesla K80 that is a 3.x version. When I executed the training script, the error would say that I do not have a CUDA available device. After removing this limitation (export CUDA_VISIBLE_DEVICES=1) I was able to run the training procedure correctly.
This version limitation is really needed? Or can we remove it?