Closed tomroh closed 10 months ago
I also encountered the same issue. I am setting up on a ubuntu 22.04 based OS (PopOS) with RTX 4070. LmStudio and Stable Diffusion have been running fine on this setup.
I am also facing the same issue. user: hi
Response: assistant: ################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################
+1. Facing the same issue when running it in docker. Chat response only returns '#' characters
+1. Same issue on Windows WSL2
+1 Facing the same issue on Ubuntu 22.04 with RTX 2060.
+1 Facing same issue RTX 3080 Windows 11
Facing same issue on Windows 11 RTX 3060ti - works with CPU, not with CUDA.
FIXED IT - seems the latest version of llama-cpp-python (0.2.29) is incompatible:
Downgraded CUDA to 11.7.1 (Not certain this is necessary, but it was done.)
Ran the following command from the install guide, but specified version for llama-cpp-python (0.2.23):
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python==0.2.23
Thank you @sslovelady confirming the model is now working as expected. You are a life saver :) - BTW i can confirming downgrading the CUDA does nothing. The fix is Version for llama-cpp-python (0.2.23)
Yep. Llama-cpp-python is kinda broken. See this thread: https://github.com/abetlen/llama-cpp-python/issues/1089
You don't need to downgrade llama-cpp-python
! Make the following edit to /private_gpt/components/llm/llm_component.py
:
logger.info("Initializing the LLM in mode=%s", llm_mode)
match settings.llm.mode:
case "local":
from llama_index.llms import LlamaCPP
prompt_style = get_prompt_style(settings.local.prompt_style)
self.llm = LlamaCPP(
model_path=str(models_path / settings.local.llm_hf_model_file),
temperature=0.1,
max_new_tokens=settings.llm.max_new_tokens,
context_window=settings.llm.context_window,
generate_kwargs={},
# All to GPU
# Adding "offload_kqv":True fixes the broken generator
model_kwargs={"n_gpu_layers": -1, "offload_kqv": True},
# transform inputs into Llama2 format
messages_to_prompt=prompt_style.messages_to_prompt,
completion_to_prompt=prompt_style.completion_to_prompt,
verbose=True,
)
I can confirm just downgrading llama-cpp-python works for me as well. Thanks @sslovelady
You don't need to downgrade
llama-cpp-python
! Make the following edit to/private_gpt/components/llm/llm_component.py
:logger.info("Initializing the LLM in mode=%s", llm_mode) match settings.llm.mode: case "local": from llama_index.llms import LlamaCPP prompt_style = get_prompt_style(settings.local.prompt_style) self.llm = LlamaCPP( model_path=str(models_path / settings.local.llm_hf_model_file), temperature=0.1, max_new_tokens=settings.llm.max_new_tokens, context_window=settings.llm.context_window, generate_kwargs={}, # All to GPU # Adding "offload_kqv":True fixes the broken generator model_kwargs={"n_gpu_layers": -1, "offload_kqv": True}, # transform inputs into Llama2 format messages_to_prompt=prompt_style.messages_to_prompt, completion_to_prompt=prompt_style.completion_to_prompt, verbose=True, )
What needs to be changed you have not written:
add , "offload_kqv": True
to model_kwargs={"n_gpu_layers": -1}
Tested on two Ubuntu 22.04 with Cuda 12.3 with partial layer upload. 5 of 41 on TheBloke/Llama-2-13B-chat-GGUF and TheBloke/GodziLLa2-70B-GGUF offloaded 30/81 layers to GPU
The fix proposed by @Koesters works for me as well! I didn't downgrade anything. Thank you Koesters!
No matter the prompt, privateGPT only returns hashes as the response. This doesn't occur when not using CUBLAS.
Set up info:
NVIDIA GeForce RTX 4080 Windows 11
accelerate==0.25.0 aiofiles==23.2.1 aiohttp==3.9.1 aiosignal==1.3.1 aiostream==0.5.2 altair==5.2.0 annotated-types==0.6.0 anyio==3.7.1 attrs==23.1.0 beautifulsoup4==4.12.2 black==22.12.0 boto3==1.34.2 botocore==1.34.2 build==1.0.3 CacheControl==0.13.1 certifi==2023.11.17 cfgv==3.4.0 charset-normalizer==3.3.2 cleo==2.1.0 click==8.1.7 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.2.0 coverage==7.3.3 crashtest==0.4.1 cycler==0.12.1 dataclasses-json==0.5.14 datasets==2.14.4 Deprecated==1.2.14 dill==0.3.7 diskcache==5.6.3 distlib==0.3.8 distro==1.8.0 dnspython==2.4.2 dulwich==0.21.7 email-validator==2.1.0.post1 evaluate==0.4.1 fastapi==0.103.2 fastjsonschema==2.19.1 ffmpy==0.3.1 filelock==3.13.1 flatbuffers==23.5.26 fonttools==4.46.0 frozenlist==1.4.1 fsspec==2023.12.2 gradio==4.10.0 gradio_client==0.7.3 greenlet==3.0.2 grpcio==1.60.0 grpcio-tools==1.60.0 h11==0.14.0 h2==4.1.0 hpack==4.0.0 httpcore==1.0.2 httptools==0.6.1 httpx==0.25.2 huggingface-hub==0.19.4 humanfriendly==10.0 hyperframe==6.0.1 identify==2.5.33 idna==3.6 importlib-resources==6.1.1 iniconfig==2.0.0 injector==0.21.0 installer==0.7.0 itsdangerous==2.1.2 jaraco.classes==3.3.0 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.3.2 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 keyring==24.3.0 kiwisolver==1.4.5 llama-index==0.9.3 llama_cpp_python==0.2.29 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 matplotlib==3.8.2 mdurl==0.1.2 more-itertools==10.2.0 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 mypy==1.7.1 mypy-extensions==1.0.0 nest-asyncio==1.5.8 networkx==3.2.1 nltk==3.8.1 nodeenv==1.8.0 numpy==1.26.3 onnx==1.15.0 onnxruntime==1.16.3 openai==1.5.0 optimum==1.16.1 orjson==3.9.10 packaging==23.2 pandas==2.1.4 pathspec==0.12.1 pexpect==4.9.0 Pillow==10.1.0 pkginfo==1.9.6 platformdirs==4.1.0 pluggy==1.3.0 poetry==1.7.1 poetry-core==1.8.1 poetry-plugin-export==1.6.0 portalocker==2.8.2 pre-commit==2.21.0 -e git+https://github.com/imartinez/privateGPT@d3acd85fe34030f8cfd7daf50b30c534087bdf2b#egg=private_gpt protobuf==4.25.1 psutil==5.9.6 ptyprocess==0.7.0 pyarrow==14.0.1 pydantic==2.5.2 pydantic-extra-types==2.2.0 pydantic-settings==2.1.0 pydantic_core==2.14.5 pydub==0.25.1 Pygments==2.17.2 pyparsing==3.1.1 pypdf==3.17.2 pyproject_hooks==1.0.0 pyreadline3==3.4.1 pytest==7.4.3 pytest-asyncio==0.21.1 pytest-cov==3.0.0 python-dateutil==2.8.2 python-dotenv==1.0.0 python-multipart==0.0.6 pytz==2023.3.post1 pywin32==306 pywin32-ctypes==0.2.2 PyYAML==6.0.1 qdrant-client==1.7.0 rapidfuzz==3.6.1 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 requests-toolbelt==1.0.0 responses==0.18.0 rich==13.7.0 rpds-py==0.14.1 ruff==0.1.8 s3transfer==0.9.0 safetensors==0.4.1 scikit-learn==1.3.2 scipy==1.11.4 semantic-version==2.10.0 sentence-transformers==2.2.2 sentencepiece==0.1.99 shellingham==1.5.4 six==1.16.0 sniffio==1.3.0 soupsieve==2.5 SQLAlchemy==2.0.23 starlette==0.27.0 sympy==1.12 tenacity==8.2.3 threadpoolctl==3.2.0 tiktoken==0.5.2 tokenizers==0.15.0 tomlkit==0.12.0 toolz==0.12.0 torch==2.1.2+cu121 torchaudio==2.1.2+cu121 torchvision==0.16.2+cu121 tqdm==4.66.1 transformers==4.36.1 trove-classifiers==2024.1.8 typer==0.9.0 types-PyYAML==6.0.12.12 typing-inspect==0.9.0 typing_extensions==4.9.0 tzdata==2023.3 ujson==5.9.0 urllib3==1.26.18 uvicorn==0.24.0.post1 virtualenv==20.25.0 watchdog==3.0.0 watchfiles==0.21.0 websockets==11.0.3 wrapt==1.16.0 xxhash==3.4.1 yarl==1.9.4