nod-ai / SHARK-Studio

SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution
Apache License 2.0
1.42k stars 171 forks source link

RDNA3 on Linux #732

Open gnusenpai opened 1 year ago

gnusenpai commented 1 year ago

It sounds like using a particular driver on Windows allows RDNA3 to work. However, on Linux, none of the 3 Vulkan drivers are currently capable of generating images.

Here are the drivers tested and the results:

I had to employ a few workarounds to get this far, so I'll list them here in case they might be the cause:

  1. Torchvision was not installed by default. I manually installed it with the command I posted on the relevant issue here: https://github.com/nod-ai/SHARK/issues/703#issuecomment-1365307370
  2. The Vulkan target triple is not detected correctly on RADV, AMDVLK, or AMDGPU-PRO drivers. I assume this is because the device names are different depending on the platform/driver used. On Windows, I think this is AMD Radeon RX 7900 XTX, but on Linux, it is AMD Radeon Graphics (RADV GFX1100) on RADV and AMD Radeon Graphics on AMDVLK and AMDGPU-PRO. I used the following patch to workaround this:

    diff --git a/shark/iree_utils/vulkan_utils.py b/shark/iree_utils/vulkan_utils.py
    index 9c73eaa..cd10f9e 100644
    --- a/shark/iree_utils/vulkan_utils.py
    +++ b/shark/iree_utils/vulkan_utils.py
    @@ -86,10 +86,8 @@ def get_vulkan_target_triple(device_name):
         triple = f"pascal-gtx1080-{system_os}"
    
     # Amd Targets
    -    elif all(x in device_name for x in ("AMD", "7900")):
    -        triple = f"rdna3-7900-{system_os}"
     elif any(x in device_name for x in ("AMD", "Radeon")):
    -        triple = f"rdna2-unknown-{system_os}"
    +        triple = f"rdna3-7900-{system_os}"
     else:
         triple = None
     return triple

System specs: OS: Gentoo Linux Kernel: 6.2.0-rc1 Python: 3.10.9 GPU: AMD Radeon RX 7900 XTX

gnusenpai commented 1 year ago

Well this is interesting. I figured I'd try using AMDGPU-PRO without the 2nd workaround just to see what would happen, and sure enough, it actually works. I guess this means that something with the tuned models doesn't quite work under Linux, so there is still something a bit wrong.

powderluv commented 1 year ago

Nice to see another fellow Gentoo user.

RADV has known issues. If your vulkaninfo | grep deviceName doesn't show 7900 xtx then it runs as rdna2 so it is slower. You can pass target-triple-flag in command line for now.