xuhuisheng / rocm-gfx803

185 stars 9 forks source link

OSError: libc10_cuda.so: cannot open shared object file: No such file or directory #23

Closed FranGamer1892 closed 1 year ago

FranGamer1892 commented 1 year ago

Hello, I am trying to run diff-svc on my gfx803 gpu. When I try to run inference, I get Traceback (most recent call last): File "inference.py", line 8, in <module> from infer import * File "/projects/diff-svc/diff-svc/infer.py", line 10, in <module> from infer_tools import slicer File "/projects/diff-svc/diff-svc/infer_tools/slicer.py", line 5, in <module> import torchaudio File "/usr/local/lib/python3.8/dist-packages/torchaudio/__init__.py", line 1, in <module> from torchaudio import _extension # noqa: F401 File "/usr/local/lib/python3.8/dist-packages/torchaudio/_extension.py", line 67, in <module> _init_extension() File "/usr/local/lib/python3.8/dist-packages/torchaudio/_extension.py", line 61, in _init_extension _load_lib("libtorchaudio") File "/usr/local/lib/python3.8/dist-packages/torchaudio/_extension.py", line 51, in _load_lib torch.ops.load_library(path) File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 220, in load_library ctypes.CDLL(path) File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__ self._handle = _dlopen(self._name, mode) OSError: libc10_cuda.so: cannot open shared object file: No such file or directory Is it a problem with the current pytorch version? I know it is working because when I run import torch if torch.cuda.is_available(): device = torch.device("cuda:0") print("Running on the GPU") else: device = torch.device("cpu") print("Running on the CPU") It prints "Running on the GPU". Thanks!

FranGamer1892 commented 1 year ago

By the way, I have this problem with torchaudio: ERROR: torchaudio 0.11.0 has requirement torch==1.11.0, but you'll have torch 1.11.0a0+git503a092 which is incompatible. Maybe it has something to do with it?

FranGamer1892 commented 1 year ago

I found the following requirements for diff-svc: torch==1.12.1+cu113 torchaudio==0.12.1+cu113 torchvision==0.13.1+cu113 More here... It may be the problem behind this, but maybe there could be a workaround for this? Sorry, I am no expert in this topic. Thank you.

xuhuisheng commented 1 year ago

Looks like the app try to load cuda not ROCm. And if torchaudio reports version incompatible, I think you need build torchaudio from sources with gfx803 and gfx803 versioned torch.

FranGamer1892 commented 1 year ago

I thought torch.cuda.is_available() meant cuda programs could be used, silly me...would that mean that I have to port the program to ROCm? How would I do that? And is there any guide I could follow to build torchaudio with gfx803? Thanks.

xuhuisheng commented 1 year ago

If you used ROCm versioned torch, cuda.is_available() had been ported to ROCm gpu, and dependency library will link to rocXXX library. So seems you used wrong torch or torchaudio.

It is not difficult to build torchaudio

  1. install gfx803 versioned torch, first, you need confirm that the torch can run properly. see https://github.com/xuhuisheng/rocm-build/blob/master/check/test-pytorch-device.py
  2. download torchaudio source, USE_ROCM=1 ROCclr_DIR=/opt/rocm/ python3 setup.py bdist_wheel, then you got a gfx803 verisioned torchaudio. keep the whl, you can used it next time.

And I remembered that I had package a gfx803 versioned torchaudio on ROCm-5.1, maybe you can have a try. https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm510/torchaudio-0.11.0+820b383-cp38-cp38-linux_x86_64.whl

FranGamer1892 commented 1 year ago

Hello, test-pytorch-device.py ran fine, but I am getting this error when trying to build torchaudio

CMake Error at cmake/LoadHIP.cmake:138 (find_package): By not providing "Findrocrand.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "rocrand", but CMake did not find one.

Could not find a package configuration file provided by "rocrand" with any of the following names:

rocrandConfig.cmake
rocrand-config.cmake

Add the installation prefix of "rocrand" to CMAKE_PREFIX_PATH or set "rocrand_DIR" to a directory containing one of the above files. If "rocrand" provides a separate development package or SDK, be sure it has been installed. Call Stack (most recent call first): cmake/LoadHIP.cmake:197 (find_package_and_print_version) CMakeLists.txt:78 (include)

Additionally, I had to run ln -s /opt/rocm/lib/libhsa-runtime64.so.1.7.0 /opt/rocm/lib/libhsa-runtime64.so.1.7.50300. Thank you.

FranGamer1892 commented 1 year ago

Hello, later on I found out that error was caused by an outdated LoadHIP.cmake in the torchaudio repo. Instead, I adapted the one from the pytorch repo. Now I can sucessfully build torchaudio and the gfx803 versioned torchaudio on ROCm-5.1 package works as well. Closing this.