Closed asafberreby closed 2 years ago
Hi, you can check if PyTorch sees your devices correctly and that CUDA works. Try running this in the Python interpreter and seeing what it shows:
import torch
torch.__version__ # Get PyTorch and CUDA version
torch.cuda.is_available() # Check that CUDA works
torch.cuda.device_count() # Check how many CUDA capable devices you have
# Print device human readable names
torch.cuda.get_device_name(0)
torch.cuda.get_device_name(1)
If the devices exist and CUDA works, then it's probably just an issue with the ID you are using.
You can also use CUDA_VISIBLE_DEVICES
before the command to make sure that PyTorch can only see the specified device:
# Only make GPU ID 0 visible to PyTorch
CUDA_VISIBLE_DEVICES=0 python tools/train.py
Before Asking
[x] I have read the README carefully. 我已经仔细阅读了README上的操作指引。
[X] I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集,我已经仔细阅读了训练自定义数据的教程,以及按照正确的目录结构存放数据集。(FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。)
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking
Question
I am trying to run training on custom dataset and keep getting this exception. even when im trying to run it on computer with single GPU. any thoughts?
BTW: when im trying to train with CPU it works perfectly fine.
Additional
No response