Add cuBLAS llama-cpp-python wheel installation

jllllll commented 1 year ago

Parses requirements.txt using regex to determine required version.

This has been tested to have minimal to no discernible performance difference over a locally built installation.

Wheels are hosted at: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels

I have implemented a pip package index for easier installation as there are far too many wheels to easily sift through. https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/ Instructions for using it are in the repo's readme.

AVX2 wheels are what is installed by the one-click-installer. They are the most widely compatible and also what is in the webui's requirements.txt.

oobabooga commented 1 year ago

This is amazing and will make llama.cpp properly supported.

I have made a test on Linux and got this error:

CUDA error 804 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:2107: forward compatibility was attempted on non supported HW
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

My local user is not "runner" and the mentioned folder does not exist, so I assume there is an absolute path somewhere in the llama-cpp-python==0.1.70+cu117 Linux wheel?

jllllll commented 1 year ago

@oobabooga The runner path is just a quirk of how the code is compiled. C Errors reference the source code based on the file path at compile time.

Searching around for the forward compatibility error, I see 2 potential causes:

Compiling on Ubuntu 20.04
- Compiling on Ubuntu 22.04 should fix it.
An issue with NVIDIA drivers which requires updating or reinstalling them.

I'm betting on the first one being the cause. I'll build a wheel in 22.04 for you to try. If it works, I'll rebuild all of the Linux wheels in 22.04 starting with the 0.1.70 ones and then the latest after that. It will take a while for all of them to be updated as I am currently hosting 1440 wheels in total due to the numerous possible combinations of build configurations and package versions.

It's worth noting that this has been successfully tested on Linux before, and unsuccessfully with the same error on one occasion, so it isn't entirely clear what the issue is. There are mentions of this error in reference to 20.04 Docker images on the llama.cpp repo, so I'm hoping it really is just a 20.04 VM problem.

jllllll commented 1 year ago

@missionfloyd The FinalizeS3 error on Windows is unimportant and can be avoided by closing the webui using CTRL+C key combination.

oobabooga commented 1 year ago

I use Mint 21, which is based on Ubuntu 20.04, so in principle it should work.

An NVIDIA driver issue is probably not the case as that's the same computer where I develop and my main installation works.

Once the new wheels are generated, let me know and I will be happy to test them one by one.

jllllll commented 1 year ago

@oobabooga Here is a wheel compiled on 22.04: llama_cpp_python-0.1.70+cu117-cp310-cp310-linux_x86_64.whl.zip

oobabooga commented 1 year ago

Sorry, I tested the same command on my main install and got the same error. The problem was that I had just run apt upgrade and had not restarted the computer (text-generation-webui issues page moment).

I tried your new wheel and it worked for offloading. Maybe it's better to keep using Ubuntu 20.04 LTS for better compatibility?

I'll test it on Windows now.

jllllll commented 1 year ago

Yes, that was the idea. There was a rare issue encountered in the 4bit fork of KoboldAI in which their gptq wheels built in 22.04 would sometimes fail on older Ubuntu systems with an error referencing an incompatible glibc version.

oobabooga commented 1 year ago

Confirmed to be working on Windows and Linux after fresh installs. Thanks again, this is amazing. I had been thinking of the problem of distributing llama-cpp-python with GPU support and had no idea how to handle it.

oobabooga commented 1 year ago

@jllllll about the CUDA wheels for llama-cpp-python: do you think that it could make sense to use a different namespace for those wheels (like llama_cpp_cuda instead of llama_cpp) and then include both in the requirements.txt for text-generation-webui? Then it could be possible to import one or the other based on the return value of torch.cuda.device_count() (or similar). It would work cleanly even for people not using the one-click-installer.

jllllll commented 1 year ago

@oobabooga I can make a build specifically for the webui to use for that purpose. You may have to check for torch.version.hip as well as torch.cuda.device_count() as that check may also return for ROCm devices.

I will let you know when I have a build ready for you to test.

oobabooga commented 1 year ago

Awesome! Thank you. It's pretty annoying for me personally to have to pip uninstall -y llama-cpp-python and reinstall every time I run pip install -r requirements.txt. If that works, it will make things a lot cleaner.

jllllll commented 1 year ago

@oobabooga

https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.73+cu117-cp310-cp310-win_amd64.whl; platform_system == "Windows"
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.73+cu117-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

If there are any changes I need to make, let me know.

oobabooga commented 1 year ago

Added here https://github.com/oobabooga/text-generation-webui/commit/4b19b74e6c8d9c99634e16774d3ebcb618ba7a18 and working perfectly. Thank you so much!!! This is amazing and extremely useful.

I feel like with this, llama.cpp can now be considered fully supported in the webui for the first time.

oobabooga / one-click-installers

Add cuBLAS llama-cpp-python wheel installation #102