torch / torch7

http://torch.ch
Other
9k stars 2.38k forks source link

cuda runtime error (802) : system not yet initialized .../THCGeneral.cpp:50 #1219

Closed pherrusa7 closed 4 years ago

pherrusa7 commented 4 years ago

Dear all,

After installing Pytorch successfully, it fails when trying to see current devices, hence I can't use the GPUs:

conda create -n myenv
conda activate myenv
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
python
import torch
torch.cuda.current_device()

Then, the following error appears:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCGeneral.cpp line=50 error=802 : system not yet initialized
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/iarai/home/pedro.herruzo/.conda/envs/qwe2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 377, in current_device
    _lazy_init()
  File "/iarai/home/pedro.herruzo/.conda/envs/qwe2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 197, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (802) : system not yet initialized at /opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCGeneral.cpp:50

The GPU is the following:

nvidia-smi
Mon Mar 30 23:17:07 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM3...  On   | 00000000:1E:00.0 Off |                    0 |
| N/A   37C    P0    31W / 350W |      0MiB / 32510MiB |      0%      Default |

However, it works perfectly in the following gpu in a different machine:

nvidia-smi
Mon Mar 30 23:22:17 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:1B:00.0 Off |                    0 |
| N/A   41C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |

And also in this one:

nvidia-smi
Mon Mar 30 23:24:31 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:1B:00.0 Off |                    0 |
| N/A   43C    P0    28W / 250W |     12MiB / 32510MiB |      0%      Default |

I tried to find a solution with no luck, hopefully, someone can help with this issue.

pherrusa7 commented 4 years ago

I wanted to post it in Pytorch, sorry for the inconvenience!

stu17682 commented 4 years ago

Hi Pedro

What does torch.cuda.is_available() return?

Cheers Stuart

Stuart Millar PhD Deep Learning Researcher CSIT / QUB, Belfast smillar09@qub.ac.uk

Sent from my iPhone - please excuse any typos or grammatical errors.

On 30 Mar 2020, at 22:34, Pedro Herruzo notifications@github.com wrote:

 This message is from an external sender. Please take care when responding, clicking links or opening attachments.

Dear all,

After installing Pytorch successfully, it fails when trying to see current devices, hence I can't use the GPUs:

conda create -n myenv conda activate myenv conda install pytorch torchvision cudatoolkit=10.1 -c pytorch python import torch torch.cuda.current_device()

Then, the following error appears:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCGeneral.cpp line=50 error=802 : system not yet initialized Traceback (most recent call last): File "", line 1, in File "/iarai/home/pedro.herruzo/.conda/envs/qwe2/lib/python3.6/site-packages/torch/cuda/init.py", line 377, in current_device _lazy_init() File "/iarai/home/pedro.herruzo/.conda/envs/qwe2/lib/python3.6/site-packages/torch/cuda/init.py", line 197, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (802) : system not yet initialized at /opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCGeneral.cpp:50

The GPU is the following:

nvidia-smi Mon Mar 30 23:17:07 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM3... On | 00000000:1E:00.0 Off | 0 | | N/A 37C P0 31W / 350W | 0MiB / 32510MiB | 0% Default |

However, it works perfectly in the following gpu in a different machine:

nvidia-smi Mon Mar 30 23:22:17 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:1B:00.0 Off | 0 | | N/A 41C P8 11W / 70W | 0MiB / 15109MiB | 0% Default |

And also in this one:

nvidia-smi Mon Mar 30 23:24:31 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... On | 00000000:1B:00.0 Off | 0 | | N/A 43C P0 28W / 250W | 12MiB / 32510MiB | 0% Default |

I tried to find a solution with no luck, hopefully, someone can help with this issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/torch/torch7/issues/1219, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEAV3MCBCY7LESWOCTECENDRKEF4DANCNFSM4LXAFIGQ.

pherrusa7 commented 4 years ago

Hi Pedro What does torch.cuda.is_available() return? Cheers Stuart -- Stuart Millar PhD Deep Learning Researcher CSIT / QUB, Belfast smillar09@qub.ac.uk Sent from my iPhone - please excuse any typos or grammatical errors. On 30 Mar 2020, at 22:34, Pedro Herruzo notifications@github.com wrote:  This message is from an external sender. Please take care when responding, clicking links or opening attachments. Dear all, After installing Pytorch successfully, it fails when trying to see current devices, hence I can't use the GPUs: conda create -n myenv conda activate myenv conda install pytorch torchvision cudatoolkit=10.1 -c pytorch python import torch torch.cuda.current_device() Then, the following error appears: THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCGeneral.cpp line=50 error=802 : system not yet initialized Traceback (most recent call last): File "", line 1, in File "/iarai/home/pedro.herruzo/.conda/envs/qwe2/lib/python3.6/site-packages/torch/cuda/init.py", line 377, in current_device _lazy_init() File "/iarai/home/pedro.herruzo/.conda/envs/qwe2/lib/python3.6/site-packages/torch/cuda/init.py", line 197, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (802) : system not yet initialized at /opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCGeneral.cpp:50 The GPU is the following: nvidia-smi Mon Mar 30 23:17:07 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM3... On | 00000000:1E:00.0 Off | 0 | | N/A 37C P0 31W / 350W | 0MiB / 32510MiB | 0% Default | However, it works perfectly in the following gpu in a different machine: nvidia-smi Mon Mar 30 23:22:17 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:1B:00.0 Off | 0 | | N/A 41C P8 11W / 70W | 0MiB / 15109MiB | 0% Default | And also in this one: nvidia-smi Mon Mar 30 23:24:31 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-PCIE... On | 00000000:1B:00.0 Off | 0 | | N/A 43C P0 28W / 250W | 12MiB / 32510MiB | 0% Default | I tried to find a solution with no luck, hopefully, someone can help with this issue. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#1219>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEAV3MCBCY7LESWOCTECENDRKEF4DANCNFSM4LXAFIGQ.

Dear @stu17682 ,

It is a boolean that determines if your system supports CUDA. See here for more details :)