snap-stanford / SATURN

MIT License
108 stars 17 forks source link

train-saturn.py debugging #31

Closed otoky closed 9 months ago

otoky commented 9 months ago

Hi! I followed the tutorial until the train-saturn section- I am using google colab and have pip imported all the variables and am running on a GPU enabled virtual machine. (!pip install torch==1.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html)

When i run it, I am getting error: File "/gdrive/MyDrive/SATURN/files/train-saturn.py", line 1050, in torch.cuda.set_device(args.device_num) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 404, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

not very well versed in this stuff, but is there an issue with the code recognizing which GPU?

Yanay1 commented 9 months ago

Invalid device ordinal means it is trying to set the device number to a gpu that is not on the machine I think. Try changing device_num to 0.

Otherwise there might be an issue with torch install.

otoky commented 9 months ago

i think i got it to work thanks!