Open daniandtheweb opened 4 months ago
This is the current issue I'm having with rocBLAS:
Tensile::WARNING: Global parameter WriteMasterSolutionIndex = False unrecognized.
# CodeObjectVersion from TensileCreateLibrary: V5
# CxxCompiler from TensileCreateLibrary: hipcc
# Architecture from TensileCreateLibrary: gfx90c
# LibraryFormat from TensileCreateLibrary: msgpack
Tensile::FATAL: Architecture gfx90c not supported
CMake Error at /home/daniandtheweb/Workspace/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv/cmake/TensileConfig.cmake:277 (message):
Error creating Tensile library: 255
Call Stack (most recent call first):
library/src/CMakeLists.txt:74 (TensileCreateLibraryFiles)
The main issue seems to be in the Tensile project. As it doesn't have explicit support for this card.
I can work on this at some point. I still have myself the 2400G with Vega11 and at some point I have had it running on the rocm-stack. At that time it needed also some kernel patching (5.05 kernel maybe) or it would fail on launching ml kernels. Hopefully newer kernels would now work out of the box.
I've tested some time ago Arch's rocm stack overriding the gfx version to 9.0.0 and everything worked without any kernel patching so, hopefully, it shouldn't require that much work adding at least a basic support for this card.
I've recently been testing the prebuilt pytorch for ROCm 6.1 again on this APU and it mostly works fine with the GFX version workaround. The good news is that a recent linux update (6.10) allows programs to directly access the GTT memory and use it as VRAM (the performance is quite slow but it's perfectly usable for a thin laptop and it effectively gives me 8gb of virtual vram: 40 seconds for a stable-diffusion 512x512 image).
I'm having some trouble running llama.cpp (rocBLAS related).
I'll try building again and see if I can workaround the issues I was having.
For llama.cpp, I am not sure could it help if you try to test instead by overriding it instead for example to gfx1030 card so that it's detected as a RDNA2 card. There are couple of places where different amd gpu versions are checked by using defined(gfx900), defined(gfx1030), etc... For example
ggml/src/ggml-cuda/vendors/hip.h ggml/src/ggml-cuda/common.cuh
I'm currently building the project for my laptop (Ryzen 4700U) but the integrated GPU is not officially supported. For now I've been able to successfuly build until rocBLAS but as the device is unsupported I can't get any further.
I'll try to modify the patches to add gfx90c since I've already tested the card with my distribution's rocm and overriding the GFX version to 9.0.0 makes everything work fine.
The only thing that I think it could be improved would be to add the ability to rocm to dynamically allocate RAM as vram since by default only 512 MB of memory are allocated as vram. I've tested some projects by manually allocating the vram from the bios and even if the GPU is quite slow compared to newer cards it still manages faster results than with the CPU only.