" no CUDA-capable device is detected"

swvajanyatek commented 1 year ago

I followed the directions for the "Linux NVIDIA GPU support and Windows-WSL" section, and below is what my WSL now shows, but I'm still getting "no CUDA-capable device is detected". What am I missing?

$ PGPT_PROFILES=local poetry run python -m private_gpt
15:37:57.892 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'local']

CUDA error 100 at /tmp/pip-install-n1njvklt/llama-cpp-python_9dd25f9be6684ee0ba46db57165163dd/vendor/llama.cpp/ggml-cuda.cu:5572: no CUDA-capable device is detected
current device: -2089183360


myuser@mymachine:/mnt/c/dev# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

myuser@mymachine:/mnt/c/dev/git/github_/privateGPT# nvidia-smi
Wed Nov  1 13:28:50 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01             Driver Version: 545.84       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M1000M                  On  | 00000000:01:00.0 Off |                  N/A |
| N/A    0C    P0              N/A / 200W |    746MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        23      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

Edition Windows 10 Pro
Version 22H2
Installed on    ‎10/‎18/‎2023
OS build    19045.3636
Experience  Windows Feature Experience Pack 1000.19053.1000.0

Jawn78 commented 1 year ago

If you run $env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python does it fail?

What model are you trying to load? Also, is Cuda in your environment path?

swvajanyatek commented 1 year ago

is Cuda in your environment path - I'm not sure. How do I confirm?
What model are you trying to load - these default, I did not make any changes to the settings.yml

Test command:


myuser@mymachine:/mnt/c/dev/git/github/privateGPT$ export CMAKE_ARGS="-DLLAMA_CUBLAS=on"
myuser@mymachine:/mnt/c/dev/git/github/privateGPT$ poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
Collecting llama-cpp-python
Downloading llama_cpp_python-0.2.13.tar.gz (7.2 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.2/7.2 MB 25.8 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
Downloading typing_extensions-4.8.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
Downloading numpy-1.26.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.2/61.2 kB 202.5 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python)
Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 178.4 MB/s eta 0:00:00
Downloading numpy-1.26.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 46.4 MB/s eta 0:00:00
Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... done
Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.13-cp311-cp311-manylinux_2_35_x86_64.whl size=4096130 sha256=d4d3be49e622524654d
9c3fdc9fc429c97596934db8c05d36882742f532dea33
Stored in directory: /tmp/pip-ephem-wheel-cache-8ywzpy4i/wheels/bb/fc/2d/b62eb092d886ada0b78d62c7d84ade2b1b688f9613584bc93b
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.8.0
Uninstalling typing_extensions-4.8.0:
  Successfully uninstalled typing_extensions-4.8.0
Attempting uninstall: numpy
Found existing installation: numpy 1.26.1
Uninstalling numpy-1.26.1:
  Successfully uninstalled numpy-1.26.1
Attempting uninstall: diskcache
Found existing installation: diskcache 5.6.3
Uninstalling diskcache-5.6.3:
  Successfully uninstalled diskcache-5.6.3
Attempting uninstall: llama-cpp-python
Found existing installation: llama_cpp_python 0.2.11
Uninstalling llama_cpp_python-0.2.11:
  Successfully uninstalled llama_cpp_python-0.2.11
Successfully installed diskcache-5.6.3 llama-cpp-python-0.2.13 numpy-1.26.1 typing-extensions-4.8.0
myuser@mymachine:/mnt/c/dev/git/github/privateGPT$
myuser@mymachine:/mnt/c/dev/git/github/privateGPT$ PGPT_PROFILES=local poetry run python -m private_gpt
17:09:08.736 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'local']

CUDA error 100 at /tmp/pip-install-pr8zzwn4/llama-cpp-python_8a4cf88dbf754a3eb9cea7b61f302bed/vendor/llama.cpp/ggml-cuda.cu:5823: no CUDA-capable de vice is detected current device: 0



Its an old laptop, maybe 7 yrs old.  Is it just too underpowered to run this?

AHPyXA commented 1 year ago

Had the same issue with nvcc release 11.5 and CUDA Version: 12.3

Following steps had helped

Remove nvidia-cuda-toolkit 11.5
```
apt-get purge nvidia-cuda-toolkit
```

Add /usr/local/cuda-12.3/bin to PATH environment variable

echo 'export PATH="/usr/local/cuda-12.3/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc

Reinstall llama-cpp-python

CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

Run the server
```
PGPT_PROFILES=local make run
```

swvajanyatek commented 1 year ago

@AHPyXA - awesome sauce, that did the trick, thank you!

I am now able to launch the app, but I'm seeing some errors at the very end of the startup, though the UI seems to work:

myuser@mymachine:/mnt/c/dev/git/github/privateGPT$ PGPT_PROFILES=local make run
poetry run python -m private_gpt
10:34:26.703 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'local']
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Quadro M1000M, compute capability 5.0
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /mnt/c/dev/git/github/privateGPT/models/mistral-7b-in
struct-v0.1.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
...
llama_model_loader: - tensor  290:                    output.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str
...
llama_model_loader: - kv  19:               general.quantization_version u32
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
...
llm_load_print_meta: model size       = 4.07 GiB (4.83 BPW)
llm_load_print_meta: general.name   = mistralai_mistral-7b-instruct-v0.1
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =   70.42 MB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 35/35 layers to GPU
llm_load_tensors: VRAM used: 4095.05 MB
...............................................................................................
llama_new_context_with_model: n_ctx      = 3900
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 487.50 MB
llama_new_context_with_model: kv self size  =  487.50 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 282.00 MB
llama_new_context_with_model: VRAM scratch buffer: 275.37 MB
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BL
AS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
10:38:33.366 [INFO    ] chromadb.telemetry.product.posthog - Anonymized telemetry enabled. See                     https://docs.trychroma.com/teleme
try for more information.
10:38:48.559 [INFO    ]             uvicorn.error - Started server process [3783]
10:38:48.559 [INFO    ]             uvicorn.error - Waiting for application startup.
10:38:48.560 [INFO    ]             uvicorn.error - Application startup complete.
10:38:48.560 [INFO    ]             uvicorn.error - Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)

It looks like I can upload a pdf file, but when I ask a question, it crashes:

...
11:33:15.645 [INFO    ]            uvicorn.access - 127.0.0.1:40524 - "GET /assets/logo-0a070fcf.svg HTTP/1.1" 200
llama_new_context_with_model: kv self size  =  487.50 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 282.00 MB
llama_new_context_with_model: VRAM scratch buffer: 275.37 MB
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BL
AS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
10:38:33.366 [INFO    ] chromadb.telemetry.product.posthog - Anonymized telemetry enabled. See                     https://docs.trychroma.com/teleme
try for more information.
10:38:48.559 [INFO    ]             uvicorn.error - Started server process [3783]
10:38:48.559 [INFO    ]             uvicorn.error - Waiting for application startup.
10:38:48.560 [INFO    ]             uvicorn.error - Application startup complete.
10:38:48.560 [INFO    ]             uvicorn.error - Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
11:33:14.488 [INFO    ]            uvicorn.access - 127.0.0.1:40476 - "GET / HTTP/1.1" 200
11:40:48.762 [INFO    ]            uvicorn.access - 127.0.0.1:39886 - "POST /upload HTTP/1.1" 200
11:40:49.254 [INFO    ]             uvicorn.error - ('127.0.0.1', 39906) - "WebSocket /queue/join" [accepted]
11:40:49.255 [INFO    ]             uvicorn.error - connection open
Parsing documents into nodes: 100%|██████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 165.11it/s]
Generating embeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:26<00:00,  1.52it/s]
11:41:17.413 [INFO    ]             uvicorn.error - connection closed
11:43:43.561 [INFO    ]            uvicorn.access - 127.0.0.1:47306 - "POST /run/predict HTTP/1.1" 200
11:43:43.566 [INFO    ]            uvicorn.access - 127.0.0.1:47322 - "POST /run/predict HTTP/1.1" 200
11:43:43.585 [INFO    ]            uvicorn.access - 127.0.0.1:47306 - "POST /run/predict HTTP/1.1" 200
11:43:43.941 [INFO    ]             uvicorn.error - ('127.0.0.1', 47344) - "WebSocket /queue/join" [accepted]
11:43:43.942 [INFO    ]             uvicorn.error - connection open

CUDA error 209 at /tmp/pip-install-xr5k_0nd/llama-cpp-python_ca66f5e6557a4e9f8ac49a1fb528206d/vendor/llama.cpp/ggml-cuda.cu:6768: no kernel image is available for execution on the device
current device: 0
make: *** [Makefile:36: run] Error 1

github-actions[bot] commented 1 year ago

Stale issue

seanmavley commented 1 year ago

Having similar issue. Same default conf, just run the commands. I followed @AHPyXA steps, and got mine working

Perhaps @swvajanyatek update your repository and give it another swing.

And yes, I'm able to run queries and it works fine, no errors after following the steps

github-actions[bot] commented 11 months ago

Stale issue

zylon-ai / private-gpt

" no CUDA-capable device is detected" #1156