xuhuisheng / rocm-gfx803

185 stars 9 forks source link

Pytorch GPU returns false #2

Closed Johnreidsilver closed 2 years ago

Johnreidsilver commented 2 years ago

edit: solved was missing a couple of packages: sudo apt-get install libopenblas-base libopenmpi-dev

Hi, thank you for releasing these patches to keep gfx803 working with ROCm.

Trying out Pytorch shouldn't this return true?

sudo PYTORCH_TEST_WITH_ROCM=1 python3 -c 'import torch;print("GPU:",torch.cuda.is_available())'

GPU: False

both clinfo and rocm-smi look good. I need to call clinfo with sudo though, otherwise it won't show the gpu

might be related: if I run python3 without sudo I get this error while loading torch:

import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/pytorchrocm/lib/python3.8/site-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/home/user/pytorchrocm/lib/python3.8/site-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libmpi_cxx.so.40: cannot open shared object file: No such file or directory

I have a virtual environment for the rocm version of pytorch, if I run pip3 freeze it shows the correct version of pytorch, but if I run sudo pip3 freeze it shows the non-rocm version