Closed jllllll closed 1 year ago
I have tested it on a Linux system with 2 GPUs and it worked fine:
Successfully installed SpeechRecognition-3.10.0 cmake-3.26.3 ffmpeg-1.4 ffmpeg-python-0.2.0 future-0.18.3 lit-16.0.5 llvmlite-0.40.0 more-itertools-9.1.0 numba-0.57.0 openai-whisper-20230314 soundfile-0.12.1 tiktoken-0.3.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting bitsandbytes==0.38.1
Downloading bitsandbytes-0.38.1-py3-none-any.whl (104.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 104.3/104.3 MB 22.4 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
Attempting uninstall: bitsandbytes
Found existing installation: bitsandbytes 0.39.0
Uninstalling bitsandbytes-0.39.0:
Successfully uninstalled bitsandbytes-0.39.0
Successfully installed bitsandbytes-0.38.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: GPU with compute < 7.0 detected!
Older version of bitsandbytes has been installed to maintain compatibility.
You will be unable to use --load-in-4bit!
These are the outputs of the relevant commands:
>>> nvcc_device_query = "__nvcc_device_query" if not sys.platform.startswith("win") else "__nvcc_device_query.exe"
>>> compute_array = run_cmd(os.path.join(conda_env_path, "bin", nvcc_device_query), environment=True, capture_output=True)
>>> compute_array
CompletedProcess(args='. "/root/one-click-installers/installer_files/conda/etc/profile.d/conda.sh" && conda activate "/root/one-click-installers/installer_files/env" && /root/one-click-installers/installer_files/env/bin/__nvcc_device_query', returncode=0, stdout=b'60', stderr=b'')
>>> compute_array.stdout.decode('utf-8').split(',')
['60']
It seems like even though there are 2 GPUs, only one value for the compute string is returned, maybe because the two GPUs are identical?
Since it worked without issues I will merge.
bitsandbytes Windows wheels are now compiled through GitHub Actions: https://github.com/jllllll/bitsandbytes-windows-webui/actions
That is very very nice, well done! Many people will benefit from those wheels.
Found out that the Cuda Toolkit installed in the environment includes an executable that reports the compute capability of your GPU. This allows for installation of an older bitsandbytes for GPUs that are incompatible with the latest version. This can also allow for installing a version of GPTQ that is compatible with GPUs older than Pascal, though I have not done that here.
This will need to be tested on a system with multiple GPUs. Not very experienced with Python, but I believe it should work. With multiple GPUs, the program outputs comma-separated numbers:
61,75
This also needs to be tested on Linux to ensure that it works there.
Fixes https://github.com/oobabooga/text-generation-webui/issues/2377
bitsandbytes Windows wheels are now compiled through GitHub Actions: https://github.com/jllllll/bitsandbytes-windows-webui/actions
This now also includes a workaround for an issue I was made aware of with llama-cpp-python.
It was failing to load due to the previous addition of
CUDA_PATH
to thestart_windows.bat
script when installing with the cpu option. It is assuming the presence ofCUDA_PATH
on Windows to mean that the CUDA Toolkit is installed and the relevant paths are accessible: https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama_cpp.py#L51-L53The workaround is simply to create the
bin
folder if it is missing.Fixes https://github.com/oobabooga/text-generation-webui/issues/2417, Fixes #73