supersonictw / navicl

AMD OpenCL Runtime for Navi Cards with Debian.
MIT License
7 stars 0 forks source link

HIP support / LLVM errors #1

Open billbeans opened 3 months ago

billbeans commented 3 months ago

I installed this on a relatively fresh Debian 12 install, first building by source, then with the prebuilt packages. At first clinfo showed I had ROCm installed, but it didn't list any of my devices. Then I installed hashcat and ran the info, or -I command, and now I'm getting this error:

hipInit(): 101

: CommandLine Error: Option 'limited-coverage-experimental' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Aborted

I get the exact same error minus the HIP line when I run clinfo now too, even after purging then reinstalling the packages. Is there any way I can fix this? I'm using an AMD Radeon RX 5500 card, and my Ryzen 3 5300G has an iGPU

billbeans commented 3 months ago

HOLD THE PHONE

I just removed hashcat and then purged the navicl packages, ran clinfo again, and it looks like I have POCL, but it's only listing my CPU. I think it's forced by hashcat, because I can't remove it without removing hashcat. Is there a way to have both platforms coexist?

billbeans commented 3 months ago

After adding myself to the video and render groups I reboot and now clinfo at least shows my hardware, but there is a remnant of the pocl runtime, even though I already purged it:

dlerror: libpocl.so.2.10.0: cannot open shared object file: No such file or directory

Also, clinfo hangs indefinitely, which in my experience means something is still wrong. I decided to reinstall hashcat with --no-install-recommends to see what happens, and of course I'm getting those same error messages as in my initial post. Interestingly, rocminfo lists my hardware now, whereas it wasn't before, at least without sudo

billbeans commented 3 months ago

Using a precompiled binary from hashcat's official website allows you to get past the pocl requirement. The version in the Debian repos forces you to install it. Hashcat is running through the benchmark successfully now, but I had to specify my device with -d 3, as devices 1 and 2 were HIP APIs for my GPU and iGPU, respectively. I had this problem for a while on Ubuntu, but for the last couple driver packages it's been fixed. I hope I made the right choice switching to Debian, OpenCL on Linux with AMD is a nightmare, and I want to be going forwards, not backwards

supersonictw commented 3 months ago

I hope I made the right choice switching to Debian, OpenCL on Linux with AMD is a nightmare, and I want to be going forwards, not backwards

That's the reason why the project exists. 😁

The version in the Debian repos forces you to install it.

I haven't used the prebuilt one from Debian, I'm using the binary from hashcat.

I had to specify my device with -d 3, as devices 1 and 2 were HIP APIs for my GPU and iGPU

You can use alias,

alias hashcat="hashcat -d2"

alias helps your command to be simple. 🪄

HIP APIs

RDNA1 is a tragedy for its ROCm support.

If you want to get HIP runtime in the future, it's better to get RDNA2 devices instead.

billbeans commented 3 months ago

While I can still run hashcat's benchmark with that device parameter, I think there's still something wrong with cl on my machine. darktable-cltest fails for me, and I'm not sure why other than it's some memory issue.


darktable 4.8.0
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.3.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> DISABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.

     0.0310 [dt_get_sysresource_level] switched to 1 as `default'
     0.0310   total mem:       31424MB
     0.0310   mipmap cache:    3928MB
     0.0310   available mem:   15712MB
     0.0310   singlebuff:      245MB
     0.0446 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
KFD does not support xnack mode query.
ROCr must assume xnack is disabled.
     0.0909 [opencl_init] found 1 platform
[opencl_init] found 2 devices

[dt_opencl_device_init]
   DEVICE:                   0: 'gfx1012:xnack-'
   CONF KEY:                 cldevice_v5_amdacceleratedparallelprocessinggfx1012xnack
   PLATFORM, VENDOR & ID:    AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx1012xnack
   DRIVER VERSION:           3452.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 2.0 
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          4080 MB
   MAX MEM ALLOC:            3468 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/anon/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx1012xnack_34520HSA11LC
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   CL COMPILER COMMAND:      -w -cl-fast-relaxed-math  -DAMD=1 -I"/usr/share/darktable/kernels"
   KERNEL LOADING TIME:       0.0370 sec

[dt_opencl_device_init]
   DEVICE:                   1: 'gfx90c:xnack-', NEW
   CONF KEY:                 cldevice_v5_amdacceleratedparallelprocessinggfx90cxnack
   PLATFORM, VENDOR & ID:    AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx90cxnack
   DRIVER VERSION:           3452.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 2.0 
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          512 MB
   MAX MEM ALLOC:            435 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
Memory access fault by GPU node-2 (Agent handle: 0x55d504b62fd0) on address 0x7fc41ec73000. Reason: Page not present or supervisor privilege.
Nearby memory map:
0x7fdffc400000, 0x101000, System
0x7fdffc600000, 0x101000, System
0x7fdffcc00000, 0x101000, System

PtrInfo:
    Address: 0x7fdffc400000-0x7fdffc501000/0x7fdffc400000-0x7fdffc501000
    Size: 0x101000
    Type: 1
    Owner: 0x55d504ac2600
    CanAccess: 2
        0x55d504b62150
        0x55d504b62fd0
    In block: 0x7fdffc400000, 0x101000
PtrInfo:
    Address: 0x7fdffc600000-0x7fdffc701000/0x7fdffc600000-0x7fdffc701000
    Size: 0x101000
    Type: 1
    Owner: 0x55d504ac2600
    CanAccess: 2
        0x55d504b62150
        0x55d504b62fd0
    In block: 0x7fdffc600000, 0x101000
PtrInfo:
    Address: 0x7fdffcc00000-0x7fdffcd01000/0x7fdffcc00000-0x7fdffcd01000
    Size: 0x101000
    Type: 1
    Owner: 0x55d504ac2600
    CanAccess: 2
        0x55d504b62150
        0x55d504b62fd0
    In block: 0x7fdffcc00000, 0x101000
darktable-cltest: ./src/core/runtime/runtime.cpp:1276: static bool rocr::core::Runtime::VMFaultHandler(hsa_signal_value_t, void*): Assertion `false && "GPU memory access fault."' failed.
Aborted

clinfo also hangs indefinitely for me despite accurately showing my hardware. hashcat -I hangs as well and if I don't Ctrl+C it after a minute or so it will crash my display. Did I do something wrong, or is there some way I can fix this? I had similar problems on Ubuntu before, but it eventually resolved itself, possibly because I switched to ROCm only, or possibly because of improvements in the last few amdgpu-install packages from AMD.

supersonictw commented 3 months ago

gfx90c

What is the gfx90c? An iGPU?

Try to set HIP_VISIBLE_DEVICES for disabling it plz...

billbeans commented 3 months ago

What is the gfx90c? An iGPU?

Yeah, it's a Radeon Vega series iGPU that came with my Ryzen 3 5300G CPU, codename Cezanne

Ok, I'm rewriting my last few posts so that they're easier to read

Changing the HIP_VISIBLE_DEVICES variable didn't do much to help, because it seems that HIP is still broken on both cards, and ROCm is probably still broken on the iGPU. Completely hiding the iGPU from OpenCL and HIP apps seemed to be a better solution, using the GPU_DEVICE_ORDINAL variable export GPU_DEVICE_ORDINAL="0".[1] This allows clinfo, darktable-cltest and hashcat -I to complete gracefully.

Hashcat still reveals something interesting, though. It says I have 2 backend devices for HIP, even though I should only have one:

hashcat (v6.2.6) starting in backend information mode

HIP Info:
=========

HIP.Version.: 5.2.21153

Backend Device ID #1 (Alias: #3)
  Name...........: AMD Radeon RX 5500
  Processor(s)...: 11
  Clock..........: 1900
  Memory.Total...: 4080 MB
  Memory.Free....: 4080 MB
  Local.Memory...: 64 KB
  PCI.Addr.BDFe..: 0000:03:00.0

Backend Device ID #2
  Name...........: AMD Radeon Graphics
  Processor(s)...: 6
  Clock..........: 1700
  Memory.Total...: 512 MB
  Memory.Free....: 512 MB
  Local.Memory...: 64 KB
  PCI.Addr.BDFe..: 0000:0f:00.0

OpenCL Info:
============

OpenCL Platform ID #1
  Vendor..: Advanced Micro Devices, Inc.
  Name....: AMD Accelerated Parallel Processing
  Version.: OpenCL 2.1 AMD-APP (3452.0)

  Backend Device ID #3 (Alias: #1)
    Type...........: GPU
    Vendor.ID......: 1
    Vendor.........: Advanced Micro Devices, Inc.
    Name...........: AMD Radeon RX 5500
    Version........: OpenCL 2.0 
    Processor(s)...: 11
    Clock..........: 1900
    Memory.Total...: 4080 MB (limited to 3468 MB allocatable in one block)
    Memory.Free....: 3968 MB
    Local.Memory...: 64 KB
    OpenCL.Version.: OpenCL C 2.0 
    Driver.Version.: 3452.0 (HSA1.1,LC)
    PCI.Addr.BDF...: 03:00.0

When attempting to run a clean ./hashcat.bin -b it starts with the 0000:0f:00.0 address card, and complains that it can't find it, failing each benchmark test immediately. This makes me think that it is the iGPU, and GPU_DEVICE_ORDINAL fails to block it. Setting HIP_VISIBLE_DEVICES="0" completely eliminates this card from the list.

But wait there's more...

HIP still fails to run on the RX 5500, and I don't know if it's fixable. This is what I get for every benchmark test if it starts with the HIP API:

HIP API (HIP 5.2.21153)
=======================
* Device #1: AMD Radeon RX 5500, 4080/4080 MB, 11MCU
hiprtcCompileProgram(): HIPRTC_ERROR_COMPILATION

error: unknown argument: '-flegacy-pass-manager'
1 error generated when compiling for gfx1012.

* Device #1: Kernel /home/anon/hashcat-6.2.6/OpenCL/shared.cl build failed.

* Device #1: Kernel /home/anon/hashcat-6.2.6/OpenCL/shared.cl build failed.

And this is as far as I can get for now. I'm left wondering 2 things: Should I look for a way to disable HIP completely, and will this end up causing me problems in the future with other applications I may want to use?

Attached files:

clinfo.txt - clinfo after running export GPU_DEVICE_ORDINAL="1" darktable-cltest.txt - darktable-cltest after running export GPU_DEVICE_ORDINAL="1" dmesg.txt - sudo dmesg before I discovered the GPU_DEVICE_ORDINAL env var, and was messing around with clinfo and hashcat

Sources:

1^ https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#gpu-device-ordinal

supersonictw commented 3 months ago

I'm sorry for reply late. I'm busy as a bee recently... 🥲

HIP still fails to run on the RX 5500, and I don't know if it's fixable...

HIP is impossible to run on RDNA 1 devices since they're a tragedy for their ROCm support.

Here is an issue of ComfyUI which related Radeon RX 5700 XT, RDNA 1.

OpenCL/Kompute is the only way you can make your device run the computing works you hoping.

billbeans commented 2 months ago

HIP is impossible to run on RDNA 1 devices since they're a tragedy for their ROCm support.

Hey, I just installed an RX 6600 which uses RDNA2, and I'm still having the same problems with HIP. Hashcat has the same errors and Blender can't use hardware acceleration. Is there something else I'm supposed to do now? I haven't done anything but boot up again like normal

billbeans commented 2 months ago

I also have one other question: Do you know how to get ROCm working with my CPU as well? It's a Ryzen 3 5300G and it shows up in rocminfo. It worked on Ubuntu.

I'm so close to getting this PC turned into the workstation I want, it's really just these last two things holding me back.

supersonictw commented 2 months ago

@billbeans

Define HSA_OVERRIDE_GFX_VERSION into system environment variables before loading any ROCm related library.

For RDNA2, the variable must to be HSA_OVERRIDE_GFX_VERSION=10.3.0.

Inject the variable into .bashrc is a good idea, but sometime it will not be worked: https://github.com/ROCm/ROCm/issues/2536#issuecomment-2025046411

Please make sure the variable is defined in your use case.

billbeans commented 2 months ago

Ok, Blender's HIP actually works, but NOT in the APT version, only with the binary from their site. It works regardless of HSA overrides.

For RDNA2, the variable must to be HSA_OVERRIDE_GFX_VERSION=10.3.0.

Inject the variable into .bashrc is a good idea

I've put export HSA_OVERRIDE_GFX_VERSION=10.3.0 in my ~/.profile anyway, as I hope to get into machine learning some day once I learn a bit more Python.

Hashcat's HIP still does not work, but I think that it's a Hashcat-specific problem: https://github.com/hashcat/hashcat/issues/3918#issuecomment-1937181288

I'm still not sure how to get applications (like DaVinci Resolve and hashcat) to use ROCm on my CPU, but it's hard to experiment with variables because clinfo sometimes causes a system-wide crash when I expose the iGPU. This is what I have in my ~/.profile:

# expose only the RX 5500 to OpenCL and HIP applications
export GPU_DEVICE_ORDINAL="0"
export HIP_VISIBLE_DEVICES="0"

# compatibility for HIP on RDNA2 cards
export HSA_OVERRIDE_GFX_VERSION=10.3.0

Thanks for all the help. I think this is mostly solved. The title should probably mention HIP support, as that's been the main issue here