Install older bitsandbytes on older gpus + fix llama-cpp-python issue

Found out that the Cuda Toolkit installed in the environment includes an executable that reports the compute capability of your GPU. This allows for installation of an older bitsandbytes for GPUs that are incompatible with the latest version. This can also allow for installing a version of GPTQ that is compatible with GPUs older than Pascal, though I have not done that here.

This will need to be tested on a system with multiple GPUs. Not very experienced with Python, but I believe it should work. With multiple GPUs, the program outputs comma-separated numbers: 61,75

This also needs to be tested on Linux to ensure that it works there.

Fixes https://github.com/oobabooga/text-generation-webui/issues/2377

bitsandbytes Windows wheels are now compiled through GitHub Actions: https://github.com/jllllll/bitsandbytes-windows-webui/actions

This now also includes a workaround for an issue I was made aware of with llama-cpp-python.

It was failing to load due to the previous addition of CUDA_PATH to the start_windows.bat script when installing with the cpu option. It is assuming the presence of CUDA_PATH on Windows to mean that the CUDA Toolkit is installed and the relevant paths are accessible: https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama_cpp.py#L51-L53

The workaround is simply to create the bin folder if it is missing.

Fixes https://github.com/oobabooga/text-generation-webui/issues/2417, Fixes #73

I have tested it on a Linux system with 2 GPUs and it worked fine:

Successfully installed SpeechRecognition-3.10.0 cmake-3.26.3 ffmpeg-1.4 ffmpeg-python-0.2.0 future-0.18.3 lit-16.0.5 llvmlite-0.40.0 more-itertools-9.1.0 numba-0.57.0 openai-whisper-20230314 soundfile-0.12.1 tiktoken-0.3.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting bitsandbytes==0.38.1
  Downloading bitsandbytes-0.38.1-py3-none-any.whl (104.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 104.3/104.3 MB 22.4 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
  Attempting uninstall: bitsandbytes
    Found existing installation: bitsandbytes 0.39.0
    Uninstalling bitsandbytes-0.39.0:
      Successfully uninstalled bitsandbytes-0.39.0
Successfully installed bitsandbytes-0.38.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

WARNING: GPU with compute < 7.0 detected!
Older version of bitsandbytes has been installed to maintain compatibility.
You will be unable to use --load-in-4bit!

These are the outputs of the relevant commands:

>>> nvcc_device_query = "__nvcc_device_query" if not sys.platform.startswith("win") else "__nvcc_device_query.exe"
>>> compute_array = run_cmd(os.path.join(conda_env_path, "bin", nvcc_device_query), environment=True, capture_output=True)
>>> compute_array
CompletedProcess(args='. "/root/one-click-installers/installer_files/conda/etc/profile.d/conda.sh" && conda activate "/root/one-click-installers/installer_files/env" && /root/one-click-installers/installer_files/env/bin/__nvcc_device_query', returncode=0, stdout=b'60', stderr=b'')
>>> compute_array.stdout.decode('utf-8').split(',')
['60']

It seems like even though there are 2 GPUs, only one value for the compute string is returned, maybe because the two GPUs are identical?

Since it worked without issues I will merge.

bitsandbytes Windows wheels are now compiled through GitHub Actions: https://github.com/jllllll/bitsandbytes-windows-webui/actions

That is very very nice, well done! Many people will benefit from those wheels.

oobabooga / one-click-installers

Install older bitsandbytes on older gpus + fix llama-cpp-python issue #75