GPU not being used - Githubissues

drphero commented 5 months ago

I'm unable to use the llmware one click installer because I'm using a cloud provider which makes docker a no-go. So I went with the llama_index one. Everything seems to be working, but extremely slowly.

llama_print_timings:        load time =   45200.28 ms
llama_print_timings:      sample time =      36.97 ms /   122 runs   (    0.30 ms per token,  3300.33 tokens per second)
llama_print_timings: prompt eval time =   96395.93 ms /  1019 tokens (   94.60 ms per token,    10.57 tokens per second)
llama_print_timings:        eval time =   21954.99 ms /   121 runs   (  181.45 ms per token,     5.51 tokens per second)
llama_print_timings:       total time =  118764.03 ms /  1140 tokens

This leads me to believe that the GPU (Quadro RTX 6000) is not being used. I saw that there is a check_gpu_enabled.py so I edited the model path in that and got an output that contains BLAS = 0. As a side note, the model that is automatically downloaded is different than what is listed in the readme for llama_index.

Activating the environment and running pip show torch gives:

Name: torch
Version: 2.2.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: C:\Users\Shadow\prompt_quill\llama_index_pq\installer_files\env\Lib\site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: llama-index-embeddings-huggingface

Automatic1111, ComfyUI, Oobabooga, etc. all work fine with the GPU, so I must be missing something here. Any tips to get it to use the GPU?

drphero commented 5 months ago

I found the somewhat hidden 'Setup visual studio community for llamacpp.odt'. I completed everything in there (Visual Studio with c++ was already done for some ComfyUI nodes previously) and the problem persists.

drphero commented 5 months ago

I think I figured out what the problem is. It seems that for some people, setting CMAKE_ARGS doesn't work. So I took the advice from here https://github.com/abetlen/llama-cpp-python/issues/284#issuecomment-1566292065.

So first, clone the llama-cpp-python repository with the --recurse-submodules option. Then in vendor/llama.cpp, edit CMakeLists.txt and change LLAMA_CUBLAS to ON.

Then create a venv in the llama-cpp-python directory and run set FORCE_CMAKE = 1 && pip install . -vv. This will take a while to finish. Once that's done, copy the llama_cpp and llama_cpp_python-0.2.57.dist-info directories inside venv/Lib/site-packages and paste them into installer_files\env\Lib\site-packages.

Only after doing it that way was I able to get it to use the GPU.

osi1880vr commented 5 months ago

interesting, so I copy that into the somewhat hidden docu

osi1880vr / prompt_quill

GPU not being used #7