Open VILLO88 opened 1 month ago
Hi @andriiborysov,
To resolve the issue with adding NVIDIA GPU support for Llama-CPP and addressing the errors you encounter, follow these steps:
Verify CUDA Installation: Ensure that CUDA is installed and properly configured. Check the version with:
nvcc --version
Check NVIDIA Driver: Make sure your NVIDIA driver is up to date. Check the driver version with:
nvidia-smi
Ensure Compatibility: Confirm that the version of CUDA installed is compatible with your NVIDIA driver and the libraries you are using.
Install CUDA Toolkit: Follow the instructions from the NVIDIA CUDA Toolkit website to install CUDA.
Install NVIDIA cuDNN: Follow the instructions from the NVIDIA cuDNN website to install cuDNN.
Verify GPU Availability in Python: Use this script to check if the GPU is available:
import torch
if torch.cuda.is_available():
print(f"CUDA is available. Device count: {torch.cuda.device_count()}")
for i in range(torch.cuda.device_count()):
print(f"Device {i}: {torch.cuda.get_device_name(i)}")
else:
print("CUDA is not available.")
Run a Simple Llama-CPP Model on GPU: Create a script to test running a Llama-CPP model on GPU:
from llama_cpp import LlamaModel
model = LlamaModel(model_name="your_model_name", use_gpu=True)
input_text = "Translate English to French: 'Hello, how are you?'"
output = model.generate(input_text)
print(output)
Following these steps should help resolve the GPU support issue. If you continue to face problems, please provide the exact error messages and additional details about your setup for further assistance.
Best regards, alxspiker
Hi @AndriiBorysov,
To resolve the issue with adding NVIDIA GPU support for Llama-CPP and addressing the errors you encounter, follow these steps:
1. **Verify CUDA Installation**: Ensure that CUDA is installed and properly configured. Check the version with: ```shell nvcc --version ``` 2. **Check NVIDIA Driver**: Make sure your NVIDIA driver is up to date. Check the driver version with: ```shell nvidia-smi ``` 3. **Ensure Compatibility**: Confirm that the version of CUDA installed is compatible with your NVIDIA driver and the libraries you are using. 4. **Install CUDA Toolkit**: Follow the instructions from the [NVIDIA CUDA Toolkit website](https://developer.nvidia.com/cuda-downloads) to install CUDA. 5. **Install NVIDIA cuDNN**: Follow the instructions from the [NVIDIA cuDNN website](https://developer.nvidia.com/cudnn) to install cuDNN. 6. **Verify GPU Availability in Python**: Use this script to check if the GPU is available: ```python import torch if torch.cuda.is_available(): print(f"CUDA is available. Device count: {torch.cuda.device_count()}") for i in range(torch.cuda.device_count()): print(f"Device {i}: {torch.cuda.get_device_name(i)}") else: print("CUDA is not available.") ``` 7. **Run a Simple Llama-CPP Model on GPU**: Create a script to test running a Llama-CPP model on GPU: ```python from llama_cpp import LlamaModel model = LlamaModel(model_name="your_model_name", use_gpu=True) input_text = "Translate English to French: 'Hello, how are you?'" output = model.generate(input_text) print(output) ```
Following these steps should help resolve the GPU support issue. If you continue to face problems, please provide the exact error messages and additional details about your setup for further assistance.
Best regards, alxspiker
Thanks for the quick answer, in reference to what you told me to try: 1) nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
2)nvidia-smi
Thu May 23 17:19:27 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78 Driver Version: 550.78 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro K2200 Off | 00000000:02:00.0 On | N/A |
| 42% 38C P0 1W / 39W | 416MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1380 G /usr/lib/xorg/Xorg 80MiB |
| 0 N/A N/A 1544 G /usr/bin/gnome-shell 103MiB |
| 0 N/A N/A 2511 G /usr/lib/firefox-esr/firefox-esr 173MiB |
| 0 N/A N/A 8365 G /usr/bin/nautilus 50MiB |
| 0 N/A N/A 11352 G /usr/bin/nvidia-settings 0MiB |
+-----------------------------------------------------------------------------------------+
3) nvidia driver is up to date and compatible with cuda version
4 and 5 satisfied)
6) running the script: CUDA is available. Device count: 1 Device 0: Quadro K2200
7) I had some problems running this script you gave me(i'm not a programmer) but i made another small script that run llama cpp using the mistral model of private-gpt and seems using the gpu(normally the gpu load is 0%)
photo attached
but when i try to query files in the private-gpt session i ve got the same error:
ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: no kernel image is available for execution on the device
current device: 0, in function ggml_cuda_compute_forward at /tmp/pip-install-ddhqme6y/llama-cpp-python_faf2d72bbfd246b8b4278a72b85fcccd/vendor/llama.cpp/ggml-cuda.cu:2304
err
GGML_ASSERT: /tmp/pip-install-ddhqme6y/llama-cpp-python_faf2d72bbfd246b8b4278a72b85fcccd/vendor/llama.cpp/ggml-cuda.cu:60: !"CUDA error"
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
make: *** [Makefile:36: run] Aborted
Any idea? thank you
Hello, I'm trying to add gpu support to my privategpt to speed up and everything seems to work (info below) but when I ask a question about an attached document the program crashes with the errors you see attached:
my setup
os: debian 12 python 3.11.2 Cuda compilation tools, release 12.4, V12.4.131 GPU: Nvidia quadro K2200 4GB Nvidia driver (latest) 550.78
At server startup blas=1 AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
The only thing i've changed to make it run is gpu_layers:
recap
In summary, privategpt starts correctly and the gpu is recognized, but when I try to ask a question it crashes with the error above. Is this a fixable bug or is it related to the fact that my card has "only" 4GB?
If the cause was my card, do you have any cards to suggest that would definitely work? maybe something cheap.. thanks