Closed andreaalf closed 2 years ago
I have the same issue on my environment.
Additionally, I had to install libopenmpi-dev, miopen-hip, libopenblas-dev, rocm-libs and rocm-dev to import pytorch.
How did you resolve the issue?
@ligun maybe pci atomic issue, you can run dmesg|grep kfd tko check whether card is added successful.
@xuhuisheng Thank you for the reply. I checked the log. It seems to be added successfully.
$ dmesg|grep kfd
[ 3.098492] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 3.098647] kfd kfd: amdgpu: added device 1002:67df
You can execute rocminfo
to check if rocm can run properly.
Right now, I meet a problem that if I installed latest ROCm-5.2.3 dkms - driver, then I won't run rocminfo on gfx803. So I uninstall dkms, using upstream kernel builtin amdgpu driver, then I can run ROCm properly.
rocminfo always returned error.
$ rocminfo
ROCk module is loaded
hsa api call failure at: /long_pathname_so_that_rpms_can_package_the_debug_info/src/rocminfo/rocminfo.cc:1140
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
Looks same to me.
My suggestion is uninstall amdgpu-dkms and amdgpu-dkms-firmware and reboot. This will fallback to upstream linux kernel builtin amdgpu-driver. Try rocminfo
again, maybe pass.
@xuhuisheng
Thank you very much for all your advice. Finally rocminfo ran successfully and torch.cuda.is_available()
returned true!
I can use ROCm.
I have Ubuntu 20.4.3 Kernel 5.11.0-27-generic python 3.8.10 GPU: radeon FirePro f9300x2 (equivalent as 2 radeon Nano)
Hi, now i can import successfully pythorch, but when i run torch.cuda.is_available() I get this error:
do you have any idea? thanks a lot for your support!