turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.45k stars 257 forks source link

Simplify HIP compatibility #154

Closed ardfork closed 10 months ago

ardfork commented 10 months ago

Since #22 didn't seem to fix using exllamav2 on ROCm < 5.6 as reported on #46. I think it's better to use HIPBLAS_USE_HIP_HALF which require ROCm 5.5.0; that will make less code used only for HIP compatibility.

turboderp commented 10 months ago

This makes sense, though I have no way to test it. Is there a performance impact?

ardfork commented 10 months ago

Is there a performance impact?

Not that I could notice, benchmark results are identical on this PR or master.

turboderp commented 10 months ago

Is it backwards-compatible? I.e. do I just change the build environment to ROCM_VERSION=5.5 and it should work for people still running 5.6?

ardfork commented 10 months ago

Huh, I think you should have let it to 5.6, that what most distro use and what PyTorch use. 5.5 is just when HIPBLAS_USE_HIP_HALF was introduced.

turboderp commented 10 months ago

Ooh.. I thought it was about something being deprecated in 5.6. I'll just revert then.

ardfork commented 10 months ago

No, originally that code came from exllama v1 where latest was 5.4 at the time, I believe. Then came exllama v2 and I copied hipblas compatibility code without knowing that HIPBLAS_USE_HIP_HALF was introduced.

ardfork commented 10 months ago

Also, I wanted to modify README, to maybe add some ROCm instruction (adding --extra-index-url https://download.pytorch.org/whl/rocm5.6 to pip command), but I don't think the instruction even work if you don't already have pytorch installed. python setup.py install --user just fail since it try to import stuff from torch.

turboderp commented 10 months ago

I guess PyTorch ran into the same issue I did, being that the main PyPi index doesn't support multiple variants of one package. I'm not sure what the best approach is, though. Should there be one requirements.txt for every CUDA or ROCm version, split into Windows and Linux..? I guess that only works out to six files at the moment.

But as long as you have a version of PyTorch >=2.1.0, the requirement is satisfied, at least.