turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.66k stars 214 forks source link

updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... ) #311

Open nktice opened 5 months ago

nktice commented 5 months ago

Thank you for your work... as I've not seen this mentioned I thought I would post, in the hopes that this will save others frustration and support the work. I maintain this guide for AMD GPUs that use HIP / ROCm - https://github.com/nktice/AMD-AI

I've been trying to get the source code to compile and getting errors... I'll post that below, so that you can review it, but for those looking,

I'll first post what I have found as a workaround in the interim, as that version works, or at least compiles for me so far in testing... [ This may help others who are having such issues avoid frustration. ]

git clone https://github.com/turboderp/exllamav2 
cd exllamav2
git reset --hard a4ecea6
pip install .  

Ok, and now here is the output of trying the latest version.. This is similar with a range of their drivers ( specific versions : 5.6.1, 5.7.3, 6.0 ) To reproduce, use the following commands ( output is below... )

git clone https://github.com/turboderp/exllamav2 
cd exllamav2
python setup.py build 

Output is too long to paste inside the message, so it is attached : 2024-01-19-exllamav2-compile-error.txt

turboderp commented 5 months ago

Protip when getting those walls of compiler output is to copy everything into a text editor and search for the string : error:. In this case the errors are:

/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp: At global scope:
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:267:16: error: expected initializer before ‘dec_lock’
  267 | void CUDART_CB dec_lock(hipStream_t stream, hipError_t status, void *user_data)
      |                ^~~~~~~~
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp: In member function ‘void STFile::load(at::Tensor, size_t, size_t, bool)’:
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:328:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute ‘nodiscard’ [-Wunused-result]
  328 |             hipMemcpyAsync(dst, src, copy_len, hipMemcpyHostToDevice);
      |             ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.3/include/hip/hip_runtime_api.h:3883:12: note: in call to ‘hipError_t hipMemcpyAsync(void*, const void*, size_t, hipMemcpyKind, hipStream_t)’, declared here
 3883 | hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind,
      |            ^~~~~~~~~~~~~~
/opt/rocm-5.7.3/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:329:40: error: ‘dec_lock’ was not declared in this scope; did you mean ‘clock’?
  329 |             hipStreamAddCallback(NULL, dec_lock, (void*) page, 0);
      |                                        ^~~~~~~~
      |                                        clock

I have a 7900XTX on order so I can actually start running and debugging ROCm/HIP stuff myself soon.

But in the meantime I have to assume stream callbacks don't work exactly the same in HIPified CUDA code. GPT4 suggests that the CUDART_CB macro may not be needed. So if you wouldn't mind, you could try removing that word to see if it compiles.

So around line 266 in exllamav2/exllamav2_ext/cpp/safetensors.cpp you should have:

void dec_lock(cudaStream_t stream, cudaError_t status, void *user_data)
{
    #ifdef __linux__
    STPage* p = (STPage*) user_data;
    p->locks--;
    #endif
}

I'd love to hear if it works.

nktice commented 5 months ago

It appears that you added this to the code base - and it does work now! And thanks for the tip, I'll try to do that next time I have such issue.
[ I sent the whole thing because it did something odd at the beginning - there were some warnings about ignored packages ... that looks resolved now. ]

This test was using ROCm6.0 on Ubuntu 23.04 torch 2.3.0.20240118+rocm6.0 ... with flash attention 2. Model loads, and answers questions. :)