nixified-ai / flake

A Nix flake for many AI projects
GNU Affero General Public License v3.0
624 stars 69 forks source link

CUDA detection failed for text-generation-webui #54

Closed knkski closed 8 months ago

knkski commented 8 months ago

First off, thanks for the great project.

I'm trying to run the text-generation-webui project, and hitting a CUDA error. This is on an older card, a 1080 TI, so I possibly just need to upgrade it? Here's what I'm trying to do and the error:

$ nix --extra-experimental-features nix-command --extra-experimental-features flakes run .#textgen-nvidia
warning: Using saved setting for 'extra-substituters = https://ai.cachix.org' from ~/.local/share/nix/trusted-settings.json.
warning: Using saved setting for 'extra-trusted-public-keys = ai.cachix.org-1:N9dzRK+alWwoKXQlnn0H6aUx0lU/mspIoz8hMvGvbbc=' from ~/.local/share/nix/trusted-settings.json.
False
/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /run/opengl-driver/lib:/nix/store/izaqzlav74p7q86bjwk8201wab14q4cs-cudatoolkit-11.8.0/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/home/me/.nix-profile/etc/xdg'), PosixPath('/nix/var/nix/profiles/default/etc/xdg')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/share/pixmaps'), PosixPath('/home/me/.nix-profile/share/pixmaps'), PosixPath('/nix/var/nix/profiles/default/share/icons'), PosixPath('/home/me/.nix-profile/share/icons'), PosixPath('/home/me/.icons')}
The following directories listed in your path were found to be non-existent: {PosixPath('nixpkgs=/nix/var/nix/profiles/per-user/root/channels/nixos'), PosixPath('nixos-config=/etc/nixos/configuration.nix')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/profiles/per-user/me/info'), PosixPath('/home/me/.nix-profile/share/info'), PosixPath('/home/me/.nix-profile/info'), PosixPath('/nix/var/nix/profiles/default/share/info'), PosixPath('/run/current-system/sw/info'), PosixPath('/nix/var/nix/profiles/default/info')}
The following directories listed in your path were found to be non-existent: {PosixPath('/home/me/.nix-profile/lib/gtk-2.0'), PosixPath('/nix/var/nix/profiles/default/lib/gtk-2.0'), PosixPath('/home/me/.nix-profile/lib/gtk-4.0'), PosixPath('/nix/var/nix/profiles/default/lib/gtk-4.0'), PosixPath('/home/me/.nix-profile/lib/gtk-3.0'), PosixPath('/nix/var/nix/profiles/default/lib/gtk-3.0')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/lib/mozilla/plugins'), PosixPath('/etc/profiles/per-user/me/lib/mozilla/plugins'), PosixPath('/home/me/.nix-profile/lib/mozilla/plugins'), PosixPath('/run/current-system/sw/lib/mozilla/plugins')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/share/terminfo'), PosixPath('/home/me/.nix-profile/share/terminfo')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default/lib/mozilla/plugins'), PosixPath('/etc/profiles/per-user/me/lib/mozilla/plugins'), PosixPath('/home/me/.nix-profile/lib/mozilla/plugins'), PosixPath('/run/current-system/sw/lib/mozilla/plugins')}
The following directories listed in your path were found to be non-existent: {PosixPath('/nix/var/nix/profiles/default'), PosixPath('/home/me/.nix-profile')}
The following directories listed in your path were found to be non-existent: {PosixPath('/home/me/.nix-profile/lib/libexec'), PosixPath('/nix/var/nix/profiles/default/lib/libexec'), PosixPath('/run/current-system/sw/lib/libexec'), PosixPath('/etc/profiles/per-user/me/lib/libexec')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/profiles/per-user/me/lib/kde4/plugins'), PosixPath('/run/current-system/sw/lib/kde4/plugins'), PosixPath('/nix/var/nix/profiles/default/lib/qt4/plugins'), PosixPath('/etc/profiles/per-user/me/lib/qt4/plugins'), PosixPath('/run/current-system/sw/lib/qt4/plugins'), PosixPath('/home/me/.nix-profile/lib/kde4/plugins'), PosixPath('/nix/var/nix/profiles/default/lib/kde4/plugins'), PosixPath('/home/me/.nix-profile/lib/qt4/plugins')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so.11.0'), PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
DEBUG: Possible options found for libcudart.so: {PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so.11.0'), PosixPath('/nix/store/fgpx31cf48rag3yjmrgyw2kjvc12pi5m-cuda-native-redist-11.8/lib/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  warn(msg)
CUDA SETUP: Required library version not found: libbitsandbytes_cuda118_nocublaslt.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/server.py", line 30, in <module>
    from modules import (
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/modules/chat.py", line 18, in <module>
    from modules.text_generation import (
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/modules/text_generation.py", line 24, in <module>
    from modules.models import clear_torch_cache, local_rank
  File "/nix/store/ypar8406iyb6r22n755ygvfbplwjs050-textgen-patchedSrc/modules/models.py", line 10, in <module>
    from accelerate import infer_auto_device_map, init_empty_weights
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/accelerator.py", line 35, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 131, in <module>
    from .bnb import has_4bit_bnb_layers, load_and_quantize_model
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/accelerate/utils/bnb.py", line 42, in <module>
    import bitsandbytes as bnb
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/nix/store/300ld78f8vr4gxi57pp2pbjnm91wpv55-python3-3.10.12-env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Here's the output from nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06    Driver Version: 520.56.06    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   28C    P8    10W / 280W |    546MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

If I run the bitsandbytes commands myself, the package is built and installs just fine, so I'm not sure why it's failing as part of the larger build.

MatthewCroughan commented 8 months ago

Yeah, bitsandbytes is a hard library, if I understand correctly it compiles for a specific "compute capability" like most cuda stuff, here's the logs from the much superior llama.cpp

ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6

My hunch is that we would probably need to compile bitsandbytes for the "compute capability" of the 1080ti which is probably not 8.6 like mine.

MatthewCroughan commented 8 months ago

@knkski as another suggestion, here's my nvidia-smi, try matching the cuda version and driver version and seeing what the runtime behavior is:

nvidia-smi
Sun Oct 22 14:32:32 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   44C    P8    23W / 350W |     21MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2168      G   ...xorg-server-1.20.14/bin/X       10MiB |
|    0   N/A  N/A      2277      G   ...hell-43.3/bin/gnome-shell        8MiB |
+-----------------------------------------------------------------------------+
max-privatevoid commented 8 months ago

The GTX 10 series is CC 6.1. Below CC 7.5, bitsandbytes switches to the "nocublaslt" version of one of its libraries, which we don't include currently.

Modifying preBuild for bitsandbytes should allow us to compile both variants of the library. The Makefile contains a bunch different targets, we probably want to match this to our CUDA version.

https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/bitsandbytes/default.nix#L69-L74 https://github.com/TimDettmers/bitsandbytes/blob/18e827d666fa2b70a12d539ccedc17aa51b2c97c/Makefile#L86

knkski commented 8 months ago

@MatthewCroughan @max-privatevoid Thanks for the quick responses! I was able to get it going with a workaround following @max-privatevoid's suggestion. I copied the upstream bitsandbytes default.nix file to a new packages/bitsandbytes/default.nix file in this repo, then applied this diff:

-    ''make CUDA_VERSION=${cudaVersion} cuda${cudaMajorVersion}x''
+    ''make CUDA_VERSION=${cudaVersion} cuda${cudaMajorVersion}x_nomatmul''

It compiled and is running successfully :tada:

I don't have enough experience here to polish that up into a proper PR, but I can at least confirm that it works.

Also, if anyone hits issues with the above, I previously got it working (albeit quite slowly) by just commenting out the bitsandbytes dependency. Everything ran fine, except for obviously the models couldn't get quantized.

Closing this issue since there's a workaround and the fix probably needs to happen in upstream nixpkgs, but feel free to reopen if that's incorrect.

max-privatevoid commented 8 months ago

Reopened for https://github.com/nixified-ai/flake/pull/57