Open billbeans opened 3 months ago
HOLD THE PHONE
I just removed hashcat and then purged the navicl packages, ran clinfo
again, and it looks like I have POCL, but it's only listing my CPU. I think it's forced by hashcat, because I can't remove it without removing hashcat. Is there a way to have both platforms coexist?
After adding myself to the video and render groups I reboot and now clinfo at least shows my hardware, but there is a remnant of the pocl runtime, even though I already purged it:
dlerror: libpocl.so.2.10.0: cannot open shared object file: No such file or directory
Also, clinfo hangs indefinitely, which in my experience means something is still wrong. I decided to reinstall hashcat with --no-install-recommends
to see what happens, and of course I'm getting those same error messages as in my initial post. Interestingly, rocminfo
lists my hardware now, whereas it wasn't before, at least without sudo
Using a precompiled binary from hashcat's official website allows you to get past the pocl requirement. The version in the Debian repos forces you to install it. Hashcat is running through the benchmark successfully now, but I had to specify my device with -d 3
, as devices 1 and 2 were HIP APIs for my GPU and iGPU, respectively. I had this problem for a while on Ubuntu, but for the last couple driver packages it's been fixed. I hope I made the right choice switching to Debian, OpenCL on Linux with AMD is a nightmare, and I want to be going forwards, not backwards
I hope I made the right choice switching to Debian, OpenCL on Linux with AMD is a nightmare, and I want to be going forwards, not backwards
That's the reason why the project exists. 😁
The version in the Debian repos forces you to install it.
I haven't used the prebuilt one from Debian, I'm using the binary from hashcat.
I had to specify my device with
-d 3
, as devices 1 and 2 were HIP APIs for my GPU and iGPU
You can use alias
,
alias hashcat="hashcat -d2"
alias
helps your command to be simple. 🪄
HIP APIs
RDNA1 is a tragedy for its ROCm support.
If you want to get HIP runtime in the future, it's better to get RDNA2 devices instead.
While I can still run hashcat's benchmark with that device parameter, I think there's still something wrong with cl on my machine. darktable-cltest
fails for me, and I'm not sure why other than it's some memory issue.
darktable 4.8.0
Copyright (C) 2012-2024 Johannes Hanika and other contributors.
Compile options:
Bit depth -> 64 bit
Debug -> DISABLED
SSE2 optimizations -> ENABLED
OpenMP -> ENABLED
OpenCL -> ENABLED
Lua -> ENABLED - API version 9.3.0
Colord -> ENABLED
gPhoto2 -> ENABLED
GMIC -> ENABLED - Compressed LUTs are supported
GraphicsMagick -> ENABLED
ImageMagick -> DISABLED
libavif -> DISABLED
libheif -> ENABLED
libjxl -> ENABLED
OpenJPEG -> ENABLED
OpenEXR -> ENABLED
WebP -> ENABLED
See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.
0.0310 [dt_get_sysresource_level] switched to 1 as `default'
0.0310 total mem: 31424MB
0.0310 mipmap cache: 3928MB
0.0310 available mem: 15712MB
0.0310 singlebuff: 245MB
0.0446 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
KFD does not support xnack mode query.
ROCr must assume xnack is disabled.
0.0909 [opencl_init] found 1 platform
[opencl_init] found 2 devices
[dt_opencl_device_init]
DEVICE: 0: 'gfx1012:xnack-'
CONF KEY: cldevice_v5_amdacceleratedparallelprocessinggfx1012xnack
PLATFORM, VENDOR & ID: AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
CANONICAL NAME: amdacceleratedparallelprocessinggfx1012xnack
DRIVER VERSION: 3452.0 (HSA1.1,LC)
DEVICE VERSION: OpenCL 2.0
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 4080 MB
MAX MEM ALLOC: 3468 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/anon/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx1012xnack_34520HSA11LC
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DAMD=1 -I"/usr/share/darktable/kernels"
KERNEL LOADING TIME: 0.0370 sec
[dt_opencl_device_init]
DEVICE: 1: 'gfx90c:xnack-', NEW
CONF KEY: cldevice_v5_amdacceleratedparallelprocessinggfx90cxnack
PLATFORM, VENDOR & ID: AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
CANONICAL NAME: amdacceleratedparallelprocessinggfx90cxnack
DRIVER VERSION: 3452.0 (HSA1.1,LC)
DEVICE VERSION: OpenCL 2.0
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 512 MB
MAX MEM ALLOC: 435 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
Memory access fault by GPU node-2 (Agent handle: 0x55d504b62fd0) on address 0x7fc41ec73000. Reason: Page not present or supervisor privilege.
Nearby memory map:
0x7fdffc400000, 0x101000, System
0x7fdffc600000, 0x101000, System
0x7fdffcc00000, 0x101000, System
PtrInfo:
Address: 0x7fdffc400000-0x7fdffc501000/0x7fdffc400000-0x7fdffc501000
Size: 0x101000
Type: 1
Owner: 0x55d504ac2600
CanAccess: 2
0x55d504b62150
0x55d504b62fd0
In block: 0x7fdffc400000, 0x101000
PtrInfo:
Address: 0x7fdffc600000-0x7fdffc701000/0x7fdffc600000-0x7fdffc701000
Size: 0x101000
Type: 1
Owner: 0x55d504ac2600
CanAccess: 2
0x55d504b62150
0x55d504b62fd0
In block: 0x7fdffc600000, 0x101000
PtrInfo:
Address: 0x7fdffcc00000-0x7fdffcd01000/0x7fdffcc00000-0x7fdffcd01000
Size: 0x101000
Type: 1
Owner: 0x55d504ac2600
CanAccess: 2
0x55d504b62150
0x55d504b62fd0
In block: 0x7fdffcc00000, 0x101000
darktable-cltest: ./src/core/runtime/runtime.cpp:1276: static bool rocr::core::Runtime::VMFaultHandler(hsa_signal_value_t, void*): Assertion `false && "GPU memory access fault."' failed.
Aborted
clinfo
also hangs indefinitely for me despite accurately showing my hardware. hashcat -I
hangs as well and if I don't Ctrl+C it after a minute or so it will crash my display. Did I do something wrong, or is there some way I can fix this? I had similar problems on Ubuntu before, but it eventually resolved itself, possibly because I switched to ROCm only, or possibly because of improvements in the last few amdgpu-install packages from AMD.
gfx90c
What is the gfx90c? An iGPU?
Try to set HIP_VISIBLE_DEVICES
for disabling it plz...
What is the gfx90c? An iGPU?
Yeah, it's a Radeon Vega series iGPU that came with my Ryzen 3 5300G CPU, codename Cezanne
Ok, I'm rewriting my last few posts so that they're easier to read
Changing the HIP_VISIBLE_DEVICES
variable didn't do much to help, because it seems that HIP is still broken on both cards, and ROCm is probably still broken on the iGPU. Completely hiding the iGPU from OpenCL and HIP apps seemed to be a better solution, using the GPU_DEVICE_ORDINAL
variable export GPU_DEVICE_ORDINAL="0"
.[1] This allows clinfo
, darktable-cltest
and hashcat -I
to complete gracefully.
Hashcat still reveals something interesting, though. It says I have 2 backend devices for HIP, even though I should only have one:
hashcat (v6.2.6) starting in backend information mode
HIP Info:
=========
HIP.Version.: 5.2.21153
Backend Device ID #1 (Alias: #3)
Name...........: AMD Radeon RX 5500
Processor(s)...: 11
Clock..........: 1900
Memory.Total...: 4080 MB
Memory.Free....: 4080 MB
Local.Memory...: 64 KB
PCI.Addr.BDFe..: 0000:03:00.0
Backend Device ID #2
Name...........: AMD Radeon Graphics
Processor(s)...: 6
Clock..........: 1700
Memory.Total...: 512 MB
Memory.Free....: 512 MB
Local.Memory...: 64 KB
PCI.Addr.BDFe..: 0000:0f:00.0
OpenCL Info:
============
OpenCL Platform ID #1
Vendor..: Advanced Micro Devices, Inc.
Name....: AMD Accelerated Parallel Processing
Version.: OpenCL 2.1 AMD-APP (3452.0)
Backend Device ID #3 (Alias: #1)
Type...........: GPU
Vendor.ID......: 1
Vendor.........: Advanced Micro Devices, Inc.
Name...........: AMD Radeon RX 5500
Version........: OpenCL 2.0
Processor(s)...: 11
Clock..........: 1900
Memory.Total...: 4080 MB (limited to 3468 MB allocatable in one block)
Memory.Free....: 3968 MB
Local.Memory...: 64 KB
OpenCL.Version.: OpenCL C 2.0
Driver.Version.: 3452.0 (HSA1.1,LC)
PCI.Addr.BDF...: 03:00.0
When attempting to run a clean ./hashcat.bin -b
it starts with the 0000:0f:00.0
address card, and complains that it can't find it, failing each benchmark test immediately. This makes me think that it is the iGPU, and GPU_DEVICE_ORDINAL
fails to block it. Setting HIP_VISIBLE_DEVICES="0"
completely eliminates this card from the list.
But wait there's more...
HIP still fails to run on the RX 5500, and I don't know if it's fixable. This is what I get for every benchmark test if it starts with the HIP API:
HIP API (HIP 5.2.21153)
=======================
* Device #1: AMD Radeon RX 5500, 4080/4080 MB, 11MCU
hiprtcCompileProgram(): HIPRTC_ERROR_COMPILATION
error: unknown argument: '-flegacy-pass-manager'
1 error generated when compiling for gfx1012.
* Device #1: Kernel /home/anon/hashcat-6.2.6/OpenCL/shared.cl build failed.
* Device #1: Kernel /home/anon/hashcat-6.2.6/OpenCL/shared.cl build failed.
And this is as far as I can get for now. I'm left wondering 2 things: Should I look for a way to disable HIP completely, and will this end up causing me problems in the future with other applications I may want to use?
Attached files:
clinfo.txt - clinfo
after running export GPU_DEVICE_ORDINAL="1"
darktable-cltest.txt - darktable-cltest
after running export GPU_DEVICE_ORDINAL="1"
dmesg.txt - sudo dmesg
before I discovered the GPU_DEVICE_ORDINAL env var, and was messing around with clinfo
and hashcat
Sources:
1^ https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#gpu-device-ordinal
I'm sorry for reply late. I'm busy as a bee recently... 🥲
HIP still fails to run on the RX 5500, and I don't know if it's fixable...
HIP is impossible to run on RDNA 1 devices since they're a tragedy for their ROCm support.
Here is an issue of ComfyUI which related Radeon RX 5700 XT
, RDNA 1.
OpenCL/Kompute is the only way you can make your device run the computing works you hoping.
HIP is impossible to run on RDNA 1 devices since they're a tragedy for their ROCm support.
Hey, I just installed an RX 6600 which uses RDNA2, and I'm still having the same problems with HIP. Hashcat has the same errors and Blender can't use hardware acceleration. Is there something else I'm supposed to do now? I haven't done anything but boot up again like normal
I also have one other question: Do you know how to get ROCm working with my CPU as well? It's a Ryzen 3 5300G and it shows up in rocminfo. It worked on Ubuntu.
I'm so close to getting this PC turned into the workstation I want, it's really just these last two things holding me back.
@billbeans
Define HSA_OVERRIDE_GFX_VERSION
into system environment variables before loading any ROCm related library.
For RDNA2, the variable must to be HSA_OVERRIDE_GFX_VERSION=10.3.0
.
Inject the variable into .bashrc
is a good idea, but sometime it will not be worked:
https://github.com/ROCm/ROCm/issues/2536#issuecomment-2025046411
Please make sure the variable is defined in your use case.
Ok, Blender's HIP actually works, but NOT in the APT version, only with the binary from their site. It works regardless of HSA overrides.
For RDNA2, the variable must to be HSA_OVERRIDE_GFX_VERSION=10.3.0.
Inject the variable into .bashrc is a good idea
I've put export HSA_OVERRIDE_GFX_VERSION=10.3.0
in my ~/.profile
anyway, as I hope to get into machine learning some day once I learn a bit more Python.
Hashcat's HIP still does not work, but I think that it's a Hashcat-specific problem: https://github.com/hashcat/hashcat/issues/3918#issuecomment-1937181288
I'm still not sure how to get applications (like DaVinci Resolve and hashcat) to use ROCm on my CPU, but it's hard to experiment with variables because clinfo
sometimes causes a system-wide crash when I expose the iGPU. This is what I have in my ~/.profile
:
# expose only the RX 5500 to OpenCL and HIP applications
export GPU_DEVICE_ORDINAL="0"
export HIP_VISIBLE_DEVICES="0"
# compatibility for HIP on RDNA2 cards
export HSA_OVERRIDE_GFX_VERSION=10.3.0
Thanks for all the help. I think this is mostly solved. The title should probably mention HIP support, as that's been the main issue here
I installed this on a relatively fresh Debian 12 install, first building by source, then with the prebuilt packages. At first clinfo showed I had ROCm installed, but it didn't list any of my devices. Then I installed hashcat and ran the info, or
-I
command, and now I'm getting this error:I get the exact same error minus the HIP line when I run clinfo now too, even after purging then reinstalling the packages. Is there any way I can fix this? I'm using an AMD Radeon RX 5500 card, and my Ryzen 3 5300G has an iGPU