Closed jllllll closed 1 year ago
This is amazing and will make llama.cpp properly supported.
I have made a test on Linux and got this error:
CUDA error 804 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:2107: forward compatibility was attempted on non supported HW
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
My local user is not "runner" and the mentioned folder does not exist, so I assume there is an absolute path somewhere in the llama-cpp-python==0.1.70+cu117
Linux wheel?
@oobabooga The runner
path is just a quirk of how the code is compiled. C Errors reference the source code based on the file path at compile time.
Searching around for the forward compatibility
error, I see 2 potential causes:
I'm betting on the first one being the cause. I'll build a wheel in 22.04 for you to try. If it works, I'll rebuild all of the Linux wheels in 22.04 starting with the 0.1.70 ones and then the latest after that. It will take a while for all of them to be updated as I am currently hosting 1440 wheels in total due to the numerous possible combinations of build configurations and package versions.
It's worth noting that this has been successfully tested on Linux before, and unsuccessfully with the same error on one occasion, so it isn't entirely clear what the issue is. There are mentions of this error in reference to 20.04 Docker images on the llama.cpp repo, so I'm hoping it really is just a 20.04 VM problem.
@missionfloyd The FinalizeS3
error on Windows is unimportant and can be avoided by closing the webui using CTRL+C
key combination.
I use Mint 21, which is based on Ubuntu 20.04, so in principle it should work.
An NVIDIA driver issue is probably not the case as that's the same computer where I develop and my main installation works.
Once the new wheels are generated, let me know and I will be happy to test them one by one.
@oobabooga Here is a wheel compiled on 22.04: llama_cpp_python-0.1.70+cu117-cp310-cp310-linux_x86_64.whl.zip
Sorry, I tested the same command on my main install and got the same error. The problem was that I had just run apt upgrade
and had not restarted the computer (text-generation-webui issues page moment).
I tried your new wheel and it worked for offloading. Maybe it's better to keep using Ubuntu 20.04 LTS for better compatibility?
I'll test it on Windows now.
Yes, that was the idea. There was a rare issue encountered in the 4bit fork of KoboldAI in which their gptq wheels built in 22.04 would sometimes fail on older Ubuntu systems with an error referencing an incompatible glibc version.
Confirmed to be working on Windows and Linux after fresh installs. Thanks again, this is amazing. I had been thinking of the problem of distributing llama-cpp-python with GPU support and had no idea how to handle it.
@jllllll about the CUDA wheels for llama-cpp-python: do you think that it could make sense to use a different namespace for those wheels (like llama_cpp_cuda
instead of llama_cpp
) and then include both in the requirements.txt
for text-generation-webui? Then it could be possible to import one or the other based on the return value of torch.cuda.device_count()
(or similar). It would work cleanly even for people not using the one-click-installer.
@oobabooga I can make a build specifically for the webui to use for that purpose. You may have to check for torch.version.hip
as well as torch.cuda.device_count()
as that check may also return for ROCm devices.
I will let you know when I have a build ready for you to test.
Awesome! Thank you. It's pretty annoying for me personally to have to pip uninstall -y llama-cpp-python
and reinstall every time I run pip install -r requirements.txt
. If that works, it will make things a lot cleaner.
@oobabooga
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.73+cu117-cp310-cp310-win_amd64.whl; platform_system == "Windows"
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.73+cu117-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
If there are any changes I need to make, let me know.
Added here https://github.com/oobabooga/text-generation-webui/commit/4b19b74e6c8d9c99634e16774d3ebcb618ba7a18 and working perfectly. Thank you so much!!! This is amazing and extremely useful.
I feel like with this, llama.cpp can now be considered fully supported in the webui for the first time.
Parses requirements.txt using regex to determine required version.
This has been tested to have minimal to no discernible performance difference over a locally built installation.
Wheels are hosted at: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels
I have implemented a pip package index for easier installation as there are far too many wheels to easily sift through. https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/ Instructions for using it are in the repo's readme.
AVX2 wheels are what is installed by the one-click-installer. They are the most widely compatible and also what is in the webui's requirements.txt.