Closed mlozo closed 7 months ago
https://pytorch.org/ you can check the CUDA version (your current "cuda version is 12.2" seems not compatible with the pytroch you've installed) with different pytorch versions in pytorch's official website.
On Sun, Nov 19, 2023 at 12:10 AM Mateusz @.***> wrote:
Hi DIS Project Team, I am seeking assistance with running the DIS model on a GPU. I am currently using an NVIDIA RTX A5000 Laptop GPU with 16GB RAM. Following the instructions, I have set up a conda environment named 'pytorch18'. However, when I attempt to train the model with my batch of images using python train_valid_inference_main.py (modified for my data), I encounter a compatibility issue with my GPU and the PyTorch version. The exact error message is:
--- build model ---/home/ml/anaconda3/envs/pytorch18/lib/python3.7/site-packages/torch/cuda/init.py:104: UserWarning: NVIDIA RTX A5000 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.If you want to use the NVIDIA RTX A5000 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Following a suggestion from this PyTorch forum thread https://discuss.pytorch.org/t/nvidia-nvidia-rtx-a5000-with-cuda-capability-sm-86-is-not-compatible-with-the-current-pytorch-installation/150593, I updated my installation with
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Now,
print(torch.version)
displays 1.13.1, and
print(torch.cuda.is_available())
returns False.
As a result, the model starts training on the CPU, leaving the GPU idle and unused.
I am relatively new to the field of machine learning and have been unable to find a solution to make the model train using the GPU. If I cannot resolve this, the training process will take an excessively long time.
For additional context, here are the outputs of
nvidia-smi && nvcc -V
+---------------------------------------------------------------------------------------+| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 ||-----------------------------------------+----------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. | |=========================================+======================+======================|| 0 NVIDIA RTX A5000 Laptop GPU On | 00000000:01:00.0 Off | N/A || N/A 47C P8 17W / 115W | 10MiB / 16384MiB | 0% Default || | | N/A |+-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage | |=======================================================================================|| 0 N/A N/A 1816 G /usr/lib/xorg/Xorg 4MiB |+---------------------------------------------------------------------------------------+ nvcc: NVIDIA (R) Cuda compiler driverCopyright (c) 2005-2023 NVIDIA CorporationBuilt on Fri_Nov__3_17:16:49_PDT_2023Cuda compilation tools, release 12.3, V12.3.103Build cuda_12.3.r12.3/compiler.33492891_0
I would appreciate any guidance or suggestions you can provide to resolve this issue and successfully run the model on my GPU. Thank you for your time and assistance.
— Reply to this email directly, view it on GitHub https://github.com/xuebinqin/DIS/issues/96, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORK4SX5SESLQL4MCDLDYFG5GBAVCNFSM6AAAAAA7RR74VGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDANZUGY4DANI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage: https://xuebinqin.github.io/
Hey @xuebinqin,
Thank you for your guidance - I've successfully managed to get everything up and running as it should. Now, it's working like a fast train. I didn't use the provided conda environment from pytorch18.yml, as updating the libraries to the required versions was almost a nightmare. I constantly faced version conflicts, and the resolution process (finding the correct versions) took ages.
I created a clean conda environment and, looking at the library dependency list, installed each one starting with Python and PyTorch. Everything installed smoothly, and now the model is also processing via the GPU.
Thanks again for your help!
Hi DIS Project Team, I am seeking assistance with running the DIS model on a GPU. I am currently using an NVIDIA RTX A5000 Laptop GPU with 16GB RAM. Following the instructions, I have set up a conda environment named 'pytorch18'. However, when I attempt to train the model with my batch of images using python train_valid_inference_main.py (modified for my data), I encounter a compatibility issue with my GPU and the PyTorch version. The exact error message is:
Following a suggestion from this PyTorch forum thread, I updated my installation with
Now,
displays
1.13.1
, andreturns
False
.As a result, the model starts training on the CPU, leaving the GPU idle and unused.
I am relatively new to the field of machine learning and have been unable to find a solution to make the model train using the GPU. If I cannot resolve this, the training process will take an excessively long time.
For additional context, here are the outputs of
I would appreciate any guidance or suggestions you can provide to resolve this issue and successfully run the model on my GPU. Thank you for your time and assistance.