GPU offloading not working for gglm models

NoMansPC commented 1 year ago

Describe the bug

After assigning some layers, I see that the model is still only using my CPU and RAM. Is there anything else that I need to do to force it to use the GPU as well? I've seen some people also running into the same issue.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Load the model, assign the number of GPU layers, click to generate text. It's still not using the GPU.

Screenshot

No response

Logs

I don't get an error.

System Info

Ryzen 5 3600
16GB RAM
RTX 2060 6GB VRAM

KarlynG commented 1 year ago

Same Issue. I've been trying for days but GGML models just refuse to use my GPU. Don't know if I'm understanding how GGML models work wrong or I'm missing some setting.

olinorwell commented 1 year ago

I think we need a version of the Python llama.cpp binding that's built slightly differently, I know I need one that uses the new CLBlast library code in llama.cpp

[Edit: As a Radeon user, unlike the O.P., I had to recompile the binding for CLBlast support, I did that and also needed to bring in a version of gcc too, but having done it I now have text-gen-webui using the GPU on GGML models on my Radeon card]

olinorwell commented 1 year ago

I saw this on the main page, I wonder if it might be relevant for you guys:

"bitsandbytes >= 0.39 may not work on older NVIDIA GPUs. In that case, to use --load-in-8bit, you may have to downgrade like this:...."

NoMansPC commented 1 year ago

I ticked the load in 8 bit box in the UI, still no luck.

BetaDoggo commented 1 year ago

You need to compile llama-cpp-python with cublas support as explained on the wiki. This will allow you to use the gpu but this seems to be broken as reported in #2118.

NoMansPC commented 1 year ago

You need to compile llama-cpp-python with cublas support as explained on the wiki. This will allow you to use the gpu but to be broken as reported in #2118.

This gives me an error on code line #4 (using the cmd bat file within oobabooga). Steps 2 and 3 don't seem to do anything. Step 1 works okay. I think that those steps are a little outdated. Either that or I have to run them in a specific folder in oobabooga which isn't mentioned.

BetaDoggo commented 1 year ago

This gives me an error on code line #4 (using the cmd bat file within oobabooga). Steps 2 and 3 don't seem to do anything. Step 1 works okay.

They work fine for me. What error are you getting? Steps 2 and 3 shouldn't output anything since they are just to set environment variables used by the 4th command.

NoMansPC commented 1 year ago

This gives me an error on code line #4 (using the cmd bat file within oobabooga). Steps 2 and 3 don't seem to do anything. Step 1 works okay.

They work fine for me. What error are you getting? Steps 2 and 3 shouldn't output anything since they are just to set environment variables used by the 4th command.

The following error:

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>pip uninstall -y llama-cpp-python Found existing installation: llama-cpp-python 0.1.53 Uninstalling llama-cpp-python-0.1.53: Successfully uninstalled llama-cpp-python-0.1.53

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>set CMAKE_ARGS="-DLLAMA_CUBLAS=on"

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>set FORCE_CMAKE=1

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>pip install llama-cpp-python --no-cache-dir Collecting llama-cpp-python Downloading llama_cpp_python-0.1.54.tar.gz (1.4 MB) ---------------------------------------- 1.4/1.4 MB 9.8 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: typing-extensions>=4.5.0 in d:\oobabooga_windows\installer_files\env\lib\site-packages (from llama-cpp-python) (4.5.0) Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [308 lines of output]

  --------------------------------------------------------------------------------
  -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  -- The C compiler identification is unknown
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Visual Studio 17 2022 x64 v143' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Generator

      Visual Studio 17 2022

    could not find any instance of Visual Studio.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Visual Studio 17 2022 x64 v143' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  -- The C compiler identification is unknown
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Visual Studio 16 2019 x64 v142' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Generator

      Visual Studio 16 2019

    could not find any instance of Visual Studio.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Visual Studio 16 2019 x64 v142' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  -- The C compiler identification is unknown
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Visual Studio 15 2017 x64 v141' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Generator

      Visual Studio 15 2017

    could not find any instance of Visual Studio.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Visual Studio 15 2017 x64 v141' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Running

     'nmake' '-?'

    failed with:

     The system cannot find the file specified

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Running

     'nmake' '-?'

    failed with:

     The system cannot find the file specified

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Running

     'nmake' '-?'

    failed with:

     The system cannot find the file specified

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator - failure
  --------------------------------------------------------------------------------

                  ********************************************************************************
                  scikit-build could not get a working generator for your system. Aborting build.

                  Building windows wheels for Python 3.10 requires Microsoft Visual Studio 2022.
  Get it with "Visual Studio 2017":

    https://visualstudio.microsoft.com/vs/

  Or with "Visual Studio 2019":

      https://visualstudio.microsoft.com/vs/

  Or with "Visual Studio 2022":

      https://visualstudio.microsoft.com/vs/

                  ********************************************************************************
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>

BetaDoggo commented 1 year ago

It looks like you need the Visual Studio c++ runtime. You can get it from Microsoft's website. You probably want the x64 version. You may have to restart either your terminal or your computer after installing it.

NoMansPC commented 1 year ago

Looks like I already have it. After downloading the .exe, it asked me to either delete VS or repair. Not sure if repairing would help me.

BetaDoggo commented 1 year ago

It looks like I missed the part at the top. It seems like you don't have a compiler that it can use. You probably need to install one through something like mingw. I'm not super knowledgeable about installing compilers on Windows so you'll have to figure it out yourself or find someone else who can help you.

NoMansPC commented 1 year ago

I might try later. I tried the model with kobolcpp. It's slow, so I guess that for now I'm sticking to the 7b models that work well.

KarlynG commented 1 year ago

This gives me an error on code line #4 (using the cmd bat file within oobabooga). Steps 2 and 3 don't seem to do anything. Step 1 works okay.

They work fine for me. What error are you getting? Steps 2 and 3 shouldn't output anything since they are just to set environment variables used by the 4th command.

The following error:

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>pip uninstall -y llama-cpp-python Found existing installation: llama-cpp-python 0.1.53 Uninstalling llama-cpp-python-0.1.53: Successfully uninstalled llama-cpp-python-0.1.53

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>set CMAKE_ARGS="-DLLAMA_CUBLAS=on"

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>set FORCE_CMAKE=1

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>pip install llama-cpp-python --no-cache-dir Collecting llama-cpp-python Downloading llama_cpp_python-0.1.54.tar.gz (1.4 MB) ---------------------------------------- 1.4/1.4 MB 9.8 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: typing-extensions>=4.5.0 in d:\oobabooga_windows\installer_files\env\lib\site-packages (from llama-cpp-python) (4.5.0) Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [308 lines of output]

  --------------------------------------------------------------------------------
  -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  -- The C compiler identification is unknown
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Visual Studio 17 2022 x64 v143' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Generator

      Visual Studio 17 2022

    could not find any instance of Visual Studio.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Visual Studio 17 2022 x64 v143' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  -- The C compiler identification is unknown
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Visual Studio 16 2019 x64 v142' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Generator

      Visual Studio 16 2019

    could not find any instance of Visual Studio.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Visual Studio 16 2019 x64 v142' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  -- The C compiler identification is unknown
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'Visual Studio 15 2017 x64 v141' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Generator

      Visual Studio 15 2017

    could not find any instance of Visual Studio.

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'Visual Studio 15 2017 x64 v141' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Running

     'nmake' '-?'

    failed with:

     The system cannot find the file specified

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Running

     'nmake' '-?'

    failed with:

     The system cannot find the file specified

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator - failure
  --------------------------------------------------------------------------------

  --------------------------------------------------------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator
  --------------------------------
  ---------------------------
  ----------------------
  -----------------
  ------------
  -------
  --
  Not searching for unused variables given on the command line.
  CMake Error at CMakeLists.txt:2 (PROJECT):
    Running

     'nmake' '-?'

    failed with:

     The system cannot find the file specified

  -- Configuring incomplete, errors occurred!
  --
  -------
  ------------
  -----------------
  ----------------------
  ---------------------------
  --------------------------------
  -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator - failure
  --------------------------------------------------------------------------------

                  ********************************************************************************
                  scikit-build could not get a working generator for your system. Aborting build.

                  Building windows wheels for Python 3.10 requires Microsoft Visual Studio 2022.
  Get it with "Visual Studio 2017":

    https://visualstudio.microsoft.com/vs/

  Or with "Visual Studio 2019":

      https://visualstudio.microsoft.com/vs/

  Or with "Visual Studio 2022":

      https://visualstudio.microsoft.com/vs/

                  ********************************************************************************
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

(D:\oobabooga_windows\installer_files\env) D:\oobabooga_windows>

Was having same issue, this is what I did to solve it:

Open Visual Studio Installer.
Click on Modify.
Checked Desktop development with C++ and installed. After finished reboot PC.
Open Visual Studio.
Open Tools > Command Line > Developer Command Prompt.
Move to "/oobabooga_windows" path.
Execute Command "pip install llama-cpp-python --no-cache-dir"
After done. Execute "update_windows.bat" located on "/oobabooga_windows" path.

Just want to clarify that even after all this, GGML models are still not using my GPU unfortunately. Maybe someone with the same problem can try it out to see if they get the same result.

james-s-tayler commented 1 year ago

I started running into this same issue a few days ago too. I'm running on Ubuntu and GPU is an RTX4090.

Thewimo commented 1 year ago

Got the same issue here. I am running on Windows and GPU is an RTX3090. During inference, no VRAM of my GPU is getting used. Only RAM.

olinorwell commented 1 year ago

You need to compile llama-cpp-python with cublas support as explained on the wiki. This will allow you to use the gpu but this seems to be broken as reported in #2118.

This is the key post of this thread. I have found that I need to do this each time I update the code. It keeps overwriting llama-cpp-python with a default version which doesn't have the BLAS stuff enabled.

On top of that I found I need to also force the --n-gpu-layers value to something sensible for the chosen model.

What I have done in the end is have my own version of llama-cpp-python elsewhere on my system which I compile then drop into the virtual environment text-generation-webui uses every time it updates itself.

For me that's now working very well, with GPU acceleration on my 5700XT on Linux.

The ooga-booga scripts need adjusting to set the correct llama-cpp-python defines dependent on the user's GPU.

GiusTex commented 1 year ago

What I have done in the end is have my own version of llama-cpp-python elsewhere on my system which I compile then drop into the virtual environment text-generation-webui uses every time it updates itself.

Can you explain this part with more detailed steps ? (How to make your own version, compile and drop it, ...)

olinorwell commented 1 year ago

What I have done in the end is have my own version of llama-cpp-python elsewhere on my system which I compile then drop into the virtual environment text-generation-webui uses every time it updates itself.

Can you explain this part with more detailed steps ? (How to make your own version, compile and drop it, ...)

At least on Linux the virtual environment for the software is in the 'installer_files' directory in the directory where you extract the 'easy installer' for this program to, so the packages are stored in 'installer_files/env/lib/python3.10/site-packages'. Inside that there's one called llama_cpp and one called llama_cpp_python-0.1.54.dist_info.

What I did was install llama_cpp_python on my main system, away from the virtual environment this software uses, the instructions are available at: https://github.com/abetlen/llama-cpp-python

For me I had to do this: CMAKE_ARGS="-DLLAMA_CLBLAST=on" FORCE_CMAKE=1 pip install llama-cpp-python

For cuBLAS and OpenBlas the instructions are very similar, I think those may work better with Nvidia cards, for me with my Radeon I needed the CLBlast version, I also have the CLBlast library installed on my system which is needed for this option if you go down that route, it also has a github repo.

Then I dropped my nicely made package into the virtual environment that text-generation-webui uses. To find where to get it from I actually re-ran the install command above but just with the 'pip bit at the end and it reports that it's already found and gives you it's location, you could also search your system for llama-cpp-python.

To achieve the above I deleted the two relevant directories in 'installer_files/env/lib/python3.10/site-packages' that I mentioned before, then copied over the ones with the same name that I built outside of the environment.

After doing these steps I regained my GPU acceleration in Oobabooga, which I had lost when I updated it.

GiusTex commented 1 year ago

Thanks for the instructions, when I have time I'll try it 👍🏻

qdrop17 commented 1 year ago

I have this issue too; I own a RTX3090 plus 16GB of system memory. Unfortunately this accounts for all models in my case: None of the two (EleutherAI_gpt-j-6b / mosaicml_mpt-7b-chat) loaded into VRAM. It just fills the system memory until the app crashes (out of memory).

According to https://github.com/oobabooga/text-generation-webui/blob/main/docs/System-requirements.md it shouldn't be an issue for my machine to load these models.

I deployed the software with Docker and tried both https://github.com/oobabooga/text-generation-webui#alternative-docker and https://github.com/Atinoda/text-generation-webui-docker.

I already got stable-diffusion-webui up and running with CUDA support inside the container.

I'm running Fedora 38 with the latest Docker binaries and Nvidia / Cuda drivers.

Tiny-Belt commented 1 year ago

Having the same issue here. 2080 8gb 32gb ram 0% GPU load.

Edit: Got it fixed for me. [Win11 one click installer method [NVidia]] Instructions: Helpful source: Reddit (Make sure you have windows studio 2019 or other installed and c++ thing enabled)

Go to oobabooga_windows folder and open cmd_windows.bat
Use this guide GPU offloading to rebuild your llama files! (The visual studio comes in at this point^ If you don't have the complier installed it wont be able to build it)
Profit??

ThewindMom commented 1 year ago

Having the same issue here. 2080 8gb 32gb ram 0% GPU load.

Edit: Got it fixed for me. [Win11 one click installer method [NVidia]] Instructions: Helpful source: Reddit (Make sure you have windows studio 2019 or other installed and c++ thing enabled)

Go to oobabooga_windows folder and open cmd_windows.bat

Use this guide GPU offloading to rebuild your llama files! (The visual studio comes in at this point^ If you don't have the complier installed it wont be able to build it)

Profit??

This one worked for me. I can now offload onto my GPU.

dice10240 commented 1 year ago

Having the same issue here. 2080 8gb 32gb ram 0% GPU load. Edit: Got it fixed for me. [Win11 one click installer method [NVidia]] Instructions: Helpful source: Reddit (Make sure you have windows studio 2019 or other installed and c++ thing enabled)

Go to oobabooga_windows folder and open cmd_windows.bat

Use this guide GPU offloading to rebuild your llama files! (The visual studio comes in at this point^ If you don't have the complier installed it wont be able to build it)

Profit??

This one worked for me. I can now offload onto my GPU.

This works, thank you!

qdrop17 commented 1 year ago

Having the same issue here. 2080 8gb 32gb ram 0% GPU load.

Edit: Got it fixed for me. [Win11 one click installer method [NVidia]] Instructions: Helpful source: Reddit (Make sure you have windows studio 2019 or other installed and c++ thing enabled)

Go to oobabooga_windows folder and open cmd_windows.bat

Use this guide GPU offloading to rebuild your llama files! (The visual studio comes in at this point^ If you don't have the complier installed it wont be able to build it)

Profit??

I actually did this already as my Dockerbuild includes this configuration:

https://github.com/Atinoda/text-generation-webui-docker/blob/master/Dockerfile#L89

Still, no load on GPU.

pbasov commented 1 year ago

@qdrop17 Trying to make it work in Docker as well, it seems to be a problem with cmake not locating cublas lib when llama-cpp-python is built, check logs with pip install -v. Even if you explicitly set Cmake var -DCUDA_cublas_LIBRARY=/usr/local/cuda/lib64/libcublas.so it still throws an error:

  -- Could not find nvcc executable in path specified by environment variable CUDAToolkit_ROOT=/usr/local/cuda
  CMake Warning at vendor/llama.cpp/CMakeLists.txt:199 (message):
    cuBLAS not found

It complains about the lack of nvcc, not sure if it actually needs it, llama.cpp builds fine by itself in the container once I set CUDA_cublas_LIBRARY=/usr/local/cuda/lib64/libcublas.so env var.

I made it work with apt install nvidia-cuda-toolkit, but it's obviously not a solution, since it installs a whole separate cuda instance, that we already have in the container. Will try to figure it out later today. There's some CMake magic i need to look into https://cmake.org/cmake/help/latest/module/FindCUDA.html

Maybe people more familiar with CUDA can point us in the right direction.

CoolSpot commented 1 year ago

@pbasov , good catch that cuBLAS not found is silently ignored!

The following enables CUDA GPU offload for GGLM models inside the official docker image.

Build & start docker container as described in https://github.com/oobabooga/text-generation-webui#alternative-docker
docker exec -it text-generation-webui-text-generation-webui-1 /bin/sh
Inside the container: apt-get update && apt-get install -y libcublas-dev-11-8 cuda-nvcc-11-8
Then . /app/venv/bin/activate
Then pip uninstall -y llama-cpp-python
Then CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install -v llama-cpp-python --no-cache-dir
Make sure that output of the previous command has "cuBLAS found" somewhere
Ctrl-D to exit from container's shell
docker restart text-generation-webui-text-generation-webui-1
Load a GGLM model in WebUI and check that docker logs text-generation-webui-text-generation-webui-1 says BLAS=1 like this:
```
Running on local URL:  http://0.0.0.0:7860
```

To create a public link, set share=True in launch(). INFO:Loading TheBloke_guanaco-33B-GGML... INFO:llama.cpp weights detected: models/TheBloke_guanaco-33B-GGML/guanaco-33B.ggmlv3.q4_1.bin

INFO:Cache capacity is 0 bytes llama.cpp: loading model from models/TheBloke_guanaco-33B-GGML/guanaco-33B.ggmlv3.q4_1.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 6656 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 52 llama_model_load_internal: n_layer = 60 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 3 (mostly Q4_1) llama_model_load_internal: n_ff = 17920 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 30B llama_model_load_internal: ggml ctx size = 0.13 MB llama_model_load_internal: mem required = 2558.06 MB (+ 3124.00 MB per state) llama_model_load_internal: [cublas] offloading 60 layers to GPU llama_model_load_internal: [cublas] offloading output layer to GPU llama_model_load_internal: [cublas] total VRAM used: 19137 MB .................................................................................................... llama_init_from_file: kv self size = 3120.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | INFO:Loaded the model in 7.62 seconds.


11. Don't forget to set "n-gpu-layers" slider to 128 in the Model tab

[Guanaco-33B-GGML](https://huggingface.co/TheBloke/guanaco-33B-GGML) gives me ~10 tokens/s fully offloaded on RTX 3090 consuming 19137 MB of VRAM with all default parameters (n_bath, n_ctx, etc) .

klaribot commented 1 year ago

This workaround finally enables GPU offloading in my docker environment for GGML models too, but it shouldn't be the accepted solution. Docker containers are inherently ephemeral, or are at least supposed to be treated as such, so these manual changes do not persist on container destruction.

I've been trying to translate these steps to a new version of the Dockerfile included with the project, but I don't understand why the environment variables aren't applying in the build steps.

I tried:

RUN CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1 pip3 install -v llama-cpp-python --no-cache-dir

ENV CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1
RUN pip3 install -v llama-cpp-python --no-cache-dir

ARG CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1
RUN pip3 install -v llama-cpp-python --no-cache-dir

RUN export CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" && export FORCE_CMAKE=1 && pip install llama-cpp-python --no-cache-dir

But I'm not sure how this is any different from your instructions...

UPDATE JUN 5: Found it! Turns out I just didn't follow the instructions completely and exactly! For anyone else deploying via docker, try adding this to your Dockerfile before the last CMD instruction:

RUN . /app/venv/bin/activate && \
    pip3 uninstall -y llama-cpp-python && \
    CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" FORCE_CMAKE=1 pip3 install -v llama-cpp-python --no-cache-dir

GamingDaveUk commented 1 year ago

Have the same issue, i want to use text-generation-webui to make a lora for ggml models that will be used in koboldai cpp, but it just fills my ram and cpu when i load a ggml model going to try the method mentioned further up but dont like the sound of needing to do it every update.... is there a reason this is not baked in as an option?

GoregoriDes commented 1 year ago

Is there any update regarding this? I've tried all of the options with no success, I'm doing it on Windows, not docker, using a 4090 I still can get it to use load the GPU layers.

theseus232 commented 1 year ago

Same problem here. Following https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md doesnt help either.

SGL647 commented 1 year ago

I use win10, I have the same problem, following the instructions in llama.cpp-models.md, I don't get any error, but I can't uninstall to GPU

gardner commented 1 year ago

This isn't the correct way to fix the issue but this change to the project Dockerfile worked for me:

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 as builder

RUN apt-get update && \
    apt-get install --no-install-recommends -y git vim build-essential python3-dev python3-venv && \
    rm -rf /var/lib/apt/lists/*

RUN git clone https://github.com/oobabooga/GPTQ-for-LLaMa /build

WORKDIR /build

RUN python3 -m venv /build/venv
RUN . /build/venv/bin/activate && \
    pip3 install --upgrade pip setuptools wheel && \
    pip3 install torch torchvision torchaudio && \
    pip3 install -r requirements.txt

# https://developer.nvidia.com/cuda-gpus
# for a rtx 2060: ARG TORCH_CUDA_ARCH_LIST="7.5"
ARG TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
RUN . /build/venv/bin/activate && \
    python3 setup_cuda.py bdist_wheel -d .

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

LABEL maintainer="Your Name <your.email@example.com>"
LABEL description="Docker image for GPTQ-for-LLaMa and Text Generation WebUI"

RUN apt-get update && \
    apt-get install --no-install-recommends -y \
        python3-dev libportaudio2 libasound-dev git python3 python3-pip make g++ \
        libcublas-dev-12-0 libcublas-dev-11-8 cuda-nvcc-11-8 && \
    rm -rf /var/lib/apt/lists/*

RUN --mount=type=cache,target=/root/.cache/pip pip3 install virtualenv
RUN mkdir /app

WORKDIR /app

ARG WEBUI_VERSION
RUN test -n "${WEBUI_VERSION}" && git reset --hard ${WEBUI_VERSION} || echo "Using provided webui source"

RUN virtualenv /app/venv
RUN . /app/venv/bin/activate && \
    pip3 install --upgrade pip setuptools wheel && \
    pip3 install torch torchvision torchaudio

COPY --from=builder /build /app/repositories/GPTQ-for-LLaMa
RUN . /app/venv/bin/activate && \
    pip3 install /app/repositories/GPTQ-for-LLaMa/*.whl

COPY extensions/api/requirements.txt /app/extensions/api/requirements.txt
COPY extensions/elevenlabs_tts/requirements.txt /app/extensions/elevenlabs_tts/requirements.txt
COPY extensions/google_translate/requirements.txt /app/extensions/google_translate/requirements.txt
COPY extensions/silero_tts/requirements.txt /app/extensions/silero_tts/requirements.txt
COPY extensions/whisper_stt/requirements.txt /app/extensions/whisper_stt/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/api && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/elevenlabs_tts && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/google_translate && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/silero_tts && pip3 install -r requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/whisper_stt && pip3 install -r requirements.txt

COPY requirements.txt /app/requirements.txt
RUN . /app/venv/bin/activate && \
    pip3 install -r requirements.txt

ENV CUDA_PATH=/usr/local/cuda-11.8
RUN . /app/venv/bin/activate && \
    pip3 uninstall -y llama-cpp-python && \
    CMAKE_ARGS="-DTCNN_CUDA_ARCHITECTURES=86 -DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc" FORCE_CMAKE=1 pip3 install -v llama-cpp-python --no-cache-dir

RUN cp /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so

COPY . /app/
ENV CLI_ARGS=""

CMD . /app/venv/bin/activate && python3 server.py ${CLI_ARGS}

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

oobabooga / text-generation-webui