Cannot use of GPU with CUDA instead only CPU which is very slow

dadupriv commented 2 months ago

Pre-check

[X] I have searched the existing issues and none cover this bug.

Description

Windows OS: all requirements that CUDA has gcc++ 14 Runing PrivateGPT but only with CPU not GPU

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 14620 C+G ...crosoft\Edge\Application\msedge.exe N/A | | 1 N/A N/A 9456 C+G C:\Windows\explorer.exe N/A | | 1 N/A N/A 10884 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | | 1 N/A N/A 12132 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 1 N/A N/A 14668 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | | 1 N/A N/A 17180 C+G ...am Files (x86)\VideoLAN\VLC\vlc.exe N/A | | 1 N/A N/A 18792 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | +-----------------------------------------------------------------------------------------+

I have searched, and cannot compile llama cpp with CUDA problem as below.

Anaconda Powershell

PS C:\Users\XXXXX>

$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0 Collecting llama-cpp-python Downloading llama_cpp_python-0.2.90.tar.gz (63.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.8/63.8 MB 40.9 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting numpy==1.26.0 Downloading numpy-1.26.0-cp311-cp311-win_amd64.whl.metadata (61 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.1/61.1 kB 3.4 MB/s eta 0:00:00 Collecting typing-extensions>=4.5.0 (from llama-cpp-python) Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB) Collecting diskcache>=5.6.1 (from llama-cpp-python) Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB) Collecting jinja2>=2.11.3 (from llama-cpp-python) Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB) Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python) Downloading MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl.metadata (3.1 kB) Downloading numpy-1.26.0-cp311-cp311-win_amd64.whl (15.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.8/15.8 MB 38.6 MB/s eta 0:00:00 Downloading diskcache-5.6.3-py3-none-any.whl (45 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB ? eta 0:00:00 Downloading jinja2-3.1.4-py3-none-any.whl (133 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB ? eta 0:00:00 Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB) Downloading MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB) Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [31 lines of output] scikit-build-core 0.10.6 using CMake 3.30.3 (wheel) Configuring CMake... 2024-09-11 10:14:43,243 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None loading initial cache file C:\Users\nasdadu\AppData\Local\Temp\tmp2efzwb2l\build\CMakeInit.txt -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19045. -- The C compiler identification is MSVC 19.35.32217.1 -- The CXX compiler identification is MSVC 19.35.32217.1 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.35.32215/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.35.32215/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: C:/Users/nasdadu/pinokio/bin/miniconda/Library/bin/git.exe (found version "2.42.0.windows.1") CMake Error at vendor/llama.cpp/CMakeLists.txt:95 (message): LLAMA_CUBLAS is deprecated and will be removed in the future.

    Use GGML_CUDA instead

  Call Stack (most recent call first):
    vendor/llama.cpp/CMakeLists.txt:100 (llama_option_depr)

  -- Configuring incomplete, errors occurred!

  *** CMake configuration failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.3.1 -> 24.2 [notice] To update, run: python.exe -m pip install --upgrade pip

Steps to Reproduce

Windows OS: Input commands in powershell: $env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0

Expected Behavior

Expected BLAS=1 with GPU usage

Actual Behavior

Output BLAS=0 only CPU usage

Environment

Windows 10 19045.4780 RTX 3090

Additional Information

No response

Version

No response

Setup Checklist

[X] Confirm that you have followed the installation instructions in the project’s documentation.
[X] Check that you are using the latest version of the project.
[X] Verify disk space availability for model storage and data processing.
[X] Ensure that you have the necessary permissions to run the project.

NVIDIA GPU Setup Checklist

[X] Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to CUDA's documentation)
[X] Ensure an NVIDIA GPU is installed and recognized by the system (run nvidia-smi to verify).
[X] Ensure proper permissions are set for accessing GPU resources.
[ ] Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi)

dadupriv commented 2 months ago

$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0 Solved! Replace DLLAMA_CUBLAS=on with GGML_CUDA=on

jveronese commented 1 month ago

Is there a certain way I need to launch this? I launch using https://github.com/zylon-ai/private-gpt/issues/2083 after running '$env:CMAKE_ARGS='-GGML_CUDA=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0' and it still uses my CPU instead of GPU

jacooooooooool commented 1 week ago

paste the entire line into the terminal and click enter:

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

zylon-ai / private-gpt