turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

Action to build wheels on ROCm 6.0 #421

Closed Orion-zhen closed 2 months ago

Orion-zhen commented 2 months ago

For AMD RX 7000 series GPU (e.g. 7900XTX with gfx1100), ROCm 5.6 and especially 5.7 are extremely unstable and can always lead to memory access fault in llm inferencing. A good way to solve this problem is to obtain pytorch 2.3.0+rocm6.0 and recompile all relative wheels.

It would be convenient if a wheel based on ROCm 6.0 or later could be released, instead of compiling locally. Thus I created this action script and opened this PR.

I have run the action script and tested the generated wheel on my own RX 7900XTX.

BTW, github is really stingy with hard disk space, I have been struggling to install all dependencies in such a limited hard disk : -(

Orion-zhen commented 2 months ago

Failed to pass tests with rocm aio build actions, delete the corresponding yml file.

LeoYelton commented 2 months ago

I think this merge is useful . I also build a wheel for 6.0 before like this one. 6.0 is stable, Time to update.

Orion-zhen commented 2 months ago

I think this merge is useful . I also build a wheel for 6.0 before like this one. 6.0 is stable, Time to update.

Thank you for your comment, I have tested the stable version.

turboderp commented 2 months ago

This looks good. Is there a reason for building on Ubuntu 22.04, though? I made that change before and had to revert it because a lot of server instances are still on 20.04 and the wheels won't be backwards compatible.

Orion-zhen commented 2 months ago

This looks good. Is there a reason for building on Ubuntu 22.04, though? I made that change before and had to revert it because a lot of server instances are still on 20.04 and the wheels won't be backwards compatible.

Well, the reason is simply that I like newer ones. In consideration of compatibility, it should be on 20.04. Thank you for your remind.

turboderp commented 2 months ago

Thank you.

I've added it to the matrix and done some tests. It seems like it's building correctly now with Ubuntu 20.04 and Torch 2.3.0. If v0.0.20 also builds for the CUDA wheels there should be +rocm6.0 wheels in the release as well.

Lots of stuff breaking in Torch 2.3.0 sadly, and they dropped ROCm 5.6 support. It's all kind of a mess and the build actions reflect that so they need to be tidied up a lot, I think.

Orion-zhen commented 2 months ago

ROCm 5.6 is kind of old now. Would it be possible to seperate different pytorch versions? e.g. exllamav2+rocm5.6+torch2.2, exllamav2+rocm6.0+torch2.3

turboderp commented 2 months ago

Yes, it'll build with Torch 2.2 for ROCm 5.6. As soon as I can get this stuff to work. Very close now.

turboderp commented 2 months ago

Should be done now. Tested the wheel with Torch 2.3.0 and ROCm 6.0 on the latest Manjaro. It's integrated in the existing workflow but I'll merge this anyway for completeness, then rework the all the actions for the next release. Thanks for the input. :)