simonw / llm-llama-cpp

LLM plugin for running models using llama.cpp
Apache License 2.0
136 stars 19 forks source link

"Error:" after i try to start a chat #10

Closed vadykoo closed 11 months ago

vadykoo commented 1 year ago

image

Mac 2018 Intel Core i7

Could you please advise how to fix it? I have tried the model from the tutorial and the one from the screenshot

SuperBruceJia commented 1 year ago

Same issue! Any advice? @vadykoo @simonw Thank you guys in advance!

SuperBruceJia commented 1 year ago

image

Mac 2018 Intel Core i7

Could you please advise how to fix it? I have tried the model from the tutorial and the one from the screenshot

Solved via Python 3.11 Environment. @vadykoo @simonw Thank you very much for your excellent work!

(base) ➜  models conda create --name llm python=3.11 numpy scipy                        
Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 23.3.1
  latest version: 23.7.3

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.7.3

## Package Plan ##

  environment location: /Users/brucejia/anaconda3/envs/llm

  added / updated specs:
    - numpy
    - python=3.11
    - scipy

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    numpy-1.25.2               |  py311he598dae_0          13 KB
    numpy-base-1.25.2          |  py311hfbfe69c_0         6.9 MB
    openssl-3.0.10             |       h1a28f6b_2         4.3 MB
    pip-23.2.1                 |  py311hca03da5_0         3.3 MB
    python-3.11.4              |       hb885b13_0        15.3 MB
    scipy-1.11.1               |  py311hc76d9b0_0        21.0 MB
    setuptools-68.0.0          |  py311hca03da5_0         1.2 MB
    tzdata-2023c               |       h04d1e81_0         116 KB
    wheel-0.38.4               |  py311hca03da5_0          80 KB
    ------------------------------------------------------------
                                           Total:        52.4 MB

The following NEW packages will be INSTALLED:

  blas               pkgs/main/osx-arm64::blas-1.0-openblas 
  bzip2              pkgs/main/osx-arm64::bzip2-1.0.8-h620ffc9_4 
  ca-certificates    pkgs/main/osx-arm64::ca-certificates-2023.05.30-hca03da5_0 
  libcxx             pkgs/main/osx-arm64::libcxx-14.0.6-h848a8c0_0 
  libffi             pkgs/main/osx-arm64::libffi-3.4.4-hca03da5_0 
  libgfortran        pkgs/main/osx-arm64::libgfortran-5.0.0-11_3_0_hca03da5_28 
  libgfortran5       pkgs/main/osx-arm64::libgfortran5-11.3.0-h009349e_28 
  libopenblas        pkgs/main/osx-arm64::libopenblas-0.3.21-h269037a_0 
  llvm-openmp        pkgs/main/osx-arm64::llvm-openmp-14.0.6-hc6e5704_0 
  ncurses            pkgs/main/osx-arm64::ncurses-6.4-h313beb8_0 
  numpy              pkgs/main/osx-arm64::numpy-1.25.2-py311he598dae_0 
  numpy-base         pkgs/main/osx-arm64::numpy-base-1.25.2-py311hfbfe69c_0 
  openssl            pkgs/main/osx-arm64::openssl-3.0.10-h1a28f6b_2 
  pip                pkgs/main/osx-arm64::pip-23.2.1-py311hca03da5_0 
  python             pkgs/main/osx-arm64::python-3.11.4-hb885b13_0 
  readline           pkgs/main/osx-arm64::readline-8.2-h1a28f6b_0 
  scipy              pkgs/main/osx-arm64::scipy-1.11.1-py311hc76d9b0_0 
  setuptools         pkgs/main/osx-arm64::setuptools-68.0.0-py311hca03da5_0 
  sqlite             pkgs/main/osx-arm64::sqlite-3.41.2-h80987f9_0 
  tk                 pkgs/main/osx-arm64::tk-8.6.12-hb8d0fd4_0 
  tzdata             pkgs/main/noarch::tzdata-2023c-h04d1e81_0 
  wheel              pkgs/main/osx-arm64::wheel-0.38.4-py311hca03da5_0 
  xz                 pkgs/main/osx-arm64::xz-5.4.2-h80987f9_0 
  zlib               pkgs/main/osx-arm64::zlib-1.2.13-h5a0b063_0 

Proceed ([y]/n)? y

Downloading and Extracting Packages

Preparing transaction: done                                                                                                                      
Verifying transaction: done                                                                                                                      
Executing transaction: done                                                                                                                      
#                                                                                                                                                
# To activate this environment, use                                                                                                              
#                                                                                                                                                
#     $ conda activate llm                                                                                                                       
#                                                                                                                                                
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) ➜  models conda activate llm                             
(llm) ➜  models llm install llm-llama-cpp

llama_cpp not installed, install with: pip install llama-cpp-python
Requirement already satisfied: llm-llama-cpp in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (0.1a0)
Requirement already satisfied: llm in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm-llama-cpp) (0.8)
Requirement already satisfied: httpx in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm-llama-cpp) (0.24.1)
Requirement already satisfied: certifi in /opt/homebrew/opt/python-certifi/lib/python3.11/site-packages (from httpx->llm-llama-cpp) (2023.7.22)
Requirement already satisfied: httpcore<0.18.0,>=0.15.0 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from httpx->llm-llama-cpp) (0.17.3)
Requirement already satisfied: idna in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from httpx->llm-llama-cpp) (3.4)
Requirement already satisfied: sniffio in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from httpx->llm-llama-cpp) (1.3.0)
Requirement already satisfied: click in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm->llm-llama-cpp) (8.1.7)
Requirement already satisfied: openai in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm->llm-llama-cpp) (0.27.8)
Requirement already satisfied: click-default-group-wheel in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm->llm-llama-cpp) (1.2.2)
Requirement already satisfied: sqlite-utils>=3.35.0 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm->llm-llama-cpp) (3.35)
Requirement already satisfied: pydantic>=1.10.2 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm->llm-llama-cpp) (2.2.1)
Requirement already satisfied: PyYAML in /opt/homebrew/opt/pyyaml/lib/python3.11/site-packages (from llm->llm-llama-cpp) (6.0.1)
Requirement already satisfied: pluggy in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm->llm-llama-cpp) (1.2.0)
Requirement already satisfied: python-ulid in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llm->llm-llama-cpp) (1.1.0)
Requirement already satisfied: setuptools in /opt/homebrew/lib/python3.11/site-packages (from llm->llm-llama-cpp) (68.1.2)
Requirement already satisfied: pip in /opt/homebrew/lib/python3.11/site-packages (from llm->llm-llama-cpp) (23.2.1)
Requirement already satisfied: h11<0.15,>=0.13 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from httpcore<0.18.0,>=0.15.0->httpx->llm-llama-cpp) (0.14.0)
Requirement already satisfied: anyio<5.0,>=3.0 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from httpcore<0.18.0,>=0.15.0->httpx->llm-llama-cpp) (3.7.1)
Requirement already satisfied: annotated-types>=0.4.0 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from pydantic>=1.10.2->llm->llm-llama-cpp) (0.5.0)
Requirement already satisfied: pydantic-core==2.6.1 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from pydantic>=1.10.2->llm->llm-llama-cpp) (2.6.1)
Requirement already satisfied: typing-extensions>=4.6.1 in /opt/homebrew/opt/python-typing-extensions/lib/python3.11/site-packages (from pydantic>=1.10.2->llm->llm-llama-cpp) (4.7.1)
Requirement already satisfied: sqlite-fts4 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from sqlite-utils>=3.35.0->llm->llm-llama-cpp) (1.0.3)
Requirement already satisfied: tabulate in /opt/homebrew/opt/python-tabulate/lib/python3.11/site-packages (from sqlite-utils>=3.35.0->llm->llm-llama-cpp) (0.0.0)
Requirement already satisfied: python-dateutil in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from sqlite-utils>=3.35.0->llm->llm-llama-cpp) (2.8.2)
Requirement already satisfied: requests>=2.20 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from openai->llm->llm-llama-cpp) (2.31.0)
Requirement already satisfied: tqdm in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from openai->llm->llm-llama-cpp) (4.66.1)
Requirement already satisfied: aiohttp in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from openai->llm->llm-llama-cpp) (3.8.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from requests>=2.20->openai->llm->llm-llama-cpp) (3.2.0)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from requests>=2.20->openai->llm->llm-llama-cpp) (2.0.4)
Requirement already satisfied: attrs>=17.3.0 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from aiohttp->openai->llm->llm-llama-cpp) (23.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from aiohttp->openai->llm->llm-llama-cpp) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from aiohttp->openai->llm->llm-llama-cpp) (4.0.3)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from aiohttp->openai->llm->llm-llama-cpp) (1.9.2)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from aiohttp->openai->llm->llm-llama-cpp) (1.4.0)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from aiohttp->openai->llm->llm-llama-cpp) (1.3.1)
Requirement already satisfied: six>=1.5 in /opt/homebrew/opt/six/lib/python3.11/site-packages (from python-dateutil->sqlite-utils>=3.35.0->llm->llm-llama-cpp) (1.16.0)
(llm) ➜  models llm install llama-cpp-python

llama_cpp not installed, install with: pip install llama-cpp-python
Collecting llama-cpp-python
  Using cached llama_cpp_python-0.1.81-cp311-cp311-macosx_14_0_arm64.whl
Requirement already satisfied: typing-extensions>=4.5.0 in /opt/homebrew/opt/python-typing-extensions/lib/python3.11/site-packages (from llama-cpp-python) (4.7.1)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Obtaining dependency information for numpy>=1.20.0 from https://files.pythonhosted.org/packages/86/a1/b8ef999c32f26a97b5f714887e21f96c12ae99a38583a0a96e65283ac0a1/numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Using cached numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl.metadata (5.6 kB)
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Using cached diskcache-5.6.1-py3-none-any.whl (45 kB)
Using cached numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl (14.0 MB)
Installing collected packages: numpy, diskcache, llama-cpp-python
Successfully installed diskcache-5.6.1 llama-cpp-python-0.1.81 numpy-1.25.2
(llm) ➜  models llm install https://static.simonwillison.net/static/2023/llama_cpp_python-0.1.77-cp311-cp311-macosx_13_0_arm64.whl

Collecting llama-cpp-python==0.1.77
  Downloading https://static.simonwillison.net/static/2023/llama_cpp_python-0.1.77-cp311-cp311-macosx_13_0_arm64.whl (236 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 236.1/236.1 kB 956.4 kB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=4.5.0 in /opt/homebrew/opt/python-typing-extensions/lib/python3.11/site-packages (from llama-cpp-python==0.1.77) (4.7.1)
Requirement already satisfied: numpy>=1.20.0 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llama-cpp-python==0.1.77) (1.25.2)
Requirement already satisfied: diskcache>=5.6.1 in /opt/homebrew/Cellar/llm/0.8/libexec/lib/python3.11/site-packages (from llama-cpp-python==0.1.77) (5.6.1)
Installing collected packages: llama-cpp-python
  Attempting uninstall: llama-cpp-python
    Found existing installation: llama-cpp-python 0.1.81
    Uninstalling llama-cpp-python-0.1.81:
      Successfully uninstalled llama-cpp-python-0.1.81
Successfully installed llama-cpp-python-0.1.77
(llm) ➜  models llm llama-cpp download-model \
  https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin \
  --alias llama2-chat --alias l2c --llama2-chat
Downloading 6.67 GB  [####################################]  100%          
Downloaded model to /Users/brucejia/Library/Application Support/io.datasette.llm/llama-cpp/models/llama-2-7b-chat.ggmlv3.q8_0.bin
(llm) ➜  models llm -m l2c 'Tell me a joke about a llama' --system 'You are funny'                

 Why did the llama refuse to play cards? Because he always got knitted! 
(llm) ➜  models 
programmylife commented 1 year ago

I'm seeing the same "Error: " on two machines. Both were installed with Python 3.11 in a virtual environment and pip 23.1.2. I installed using llm install llama-cpp-python not the wheel.

The first is a 2023 Mac Mini (M2, 8gb RAM, I'm wondering if this one is because it needs more memory to execute). The second is a Windows PC with 32GB of RAM, Intel 9700k, and Nvidia 2080 installed in WSL. In both cases llm llama-cpp models outputs:

{
  "llama-2-7b-chat.ggmlv3.q8_0": {
    "path": "/Users/username/Library/Application Support/io.datasette.llm/llama-cpp/models/llama-2-7b-chat.ggmlv3.q8_0.bin",
    "aliases": [
      "llama2-chat",
      "l2c"
    ],
    "is_llama2_chat": true
  }
}

llm -m llama2-chast 'five creative names for a pet hedgehog' outputs: Error: 'llama2-chast' is not a known model so the model/alias seems to be recognized by llm.

SuperBruceJia commented 1 year ago

I'm seeing the same "Error: " on two machines. Both were installed with Python 3.11 in a virtual environment and pip 23.1.2. I installed using llm install llama-cpp-python not the wheel.

The first is a 2023 Mac Mini (M2, 8gb RAM, I'm wondering if this one is because it needs more memory to execute). The second is a Windows PC with 32GB of RAM, Intel 9700k, and Nvidia 2080 installed in WSL. In both cases llm llama-cpp models outputs:

{
  "llama-2-7b-chat.ggmlv3.q8_0": {
    "path": "/Users/username/Library/Application Support/io.datasette.llm/llama-cpp/models/llama-2-7b-chat.ggmlv3.q8_0.bin",
    "aliases": [
      "llama2-chat",
      "l2c"
    ],
    "is_llama2_chat": true
  }
}

llm -m llama2-chast 'five creative names for a pet hedgehog' outputs: Error: 'llama2-chast' is not a known model so the model/alias seems to be recognized by llm.

You should install a llama model and in the command line use --alias llama2-chast to allie your downloaded model to this llama2-chast name.

Some recommendations:

llm llama-cpp download-model \
  https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin \
  --alias llama2-chast --llama2-chat

then

llm -m llama2-chast 'Tell me a joke about a llama'

Please also check this blog.

If you have any further questions, please let me know.

programmylife commented 1 year ago

Thank you for the quick response. I followed that blog post (before you posted it, that's what got me started) and executed that command to download the model. Here are the commands I executed:

python3.11 -m venv .venv --prompt llm source .venv/bin/activate pip install llm llm install llm-llama-cpp llm install llama-cpp-python llm llama-cpp download-model https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin --alias llama2-chat --alias l2c --llama2-chat llm -m l2c 'Tell me a joke about a llama' Output: 'Error: '

I see how that last line in my original response was confusing. I meant to say that in addition to seeing the behavior described above (in this post) that I additionally tried to run llm with a purposefully incorrect model name to see if something was wrong with my llm install, but it seemed to properly catch the error that the model name was incorrect.

SuperBruceJia commented 1 year ago

I'd like to suggest

  1. Manually remove the downloaded model first.
  2. pip uninstall llm and restart the process again.
SuperBruceJia commented 1 year ago

Thank you for the quick response. I followed that blog post (before you posted it, that's what got me started) and executed that command to download the model. Here are the commands I executed:

python3.11 -m venv .venv --prompt llm source .venv/bin/activate pip install llm llm install llm-llama-cpp llm install llama-cpp-python llm llama-cpp download-model https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin --alias llama2-chat --alias l2c --llama2-chat llm -m l2c 'Tell me a joke about a llama' Output: 'Error: '

I see how that last line in my original response was confusing. I meant to say that in addition to seeing the behavior described above (in this post) that I additionally tried to run llm with a purposefully incorrect model name to see if something was wrong with my llm install, but it seemed to properly catch the error that the model name was incorrect.

Please check this one.

llm install https://static.simonwillison.net/static/2023/llama_cpp_python-0.1.77-cp311-cp311-macosx_13_0_arm64.whl
programmylife commented 1 year ago

Thanks. I tried with the wheel this time (removed the model file, uninstalled llm and reinstalled llm and llm-llama-cpp then the wheel, then used the same command as before to download the model and set the alias) on my Mac, and now the I'm seeing activity, but the process seems to hang. I'm seeing ~55% CPU usage for the Python process that has lasted for ~8 minutes after running llm -m l2c 'Tell me a joke about a llama'. I entered 'ctrl-c' to cancel to confirm that was the correct process and it was.

SuperBruceJia commented 1 year ago

@programmylife

Good! Glad you have made it work!

Then, I would suggest you to install OpenMP and use HomeBrew to install this llm.

brew install llvm
brew install libomp

be sure to add the corresponding paths to your path file, e.g., ~/.zshrc.

Then,

brew install llm

Good luck!

SuperBruceJia commented 1 year ago

@programmylife

Good! Glad you have made it work!

Then, I would suggest you to install OpenMP and use HomeBrew to install this llm.

brew install llvm
brew install libomp

be sure to add the corresponding paths to your path file, e.g., ~/.zshrc.

Then,

brew install llm

Good luck!

@programmylife Through this way, it will process much quicker but the CPU consumption is still like 50%.

programmylife commented 1 year ago

Thanks. I prefer to avoid brew, but I'll give that a try to check the speed. I can confirm that it is working for me now; it took 14.5 minutes to give me a llama joke :)

It would still be nice to figure out a solution for installing this without the wheel through pip.

bracki commented 1 year ago

I have the same problem on a M1 max w/ 32GB. I installed through brew. I also tried to get it to output debug info, but that does not work either.

llm prompt -m l2c 'five creative names for a pet hedgehog'
Error:

llm prompt -o verbose -m l2c 'five creative names for a pet hedgehog'
Usage: llm prompt [OPTIONS] [PROMPT]
Try 'llm prompt --help' for help.

Error: Got unexpected extra argument (five creative names for a pet hedgehog)

llm prompt -o verbose=true -m l2c 'five creative names for a pet hedgehog'
Usage: llm prompt [OPTIONS] [PROMPT]
Try 'llm prompt --help' for help.

Error: Got unexpected extra argument (five creative names for a pet hedgehog)
SuperBruceJia commented 1 year ago

Simply try

llm -m l2c 'Tell me a joke about a llama'

Please check this blog and this blog. @bracki

SuperBruceJia commented 1 year ago

I have the same problem on a M1 max w/ 32GB. I installed through brew. I also tried to get it to output debug info, but that does not work either.

llm prompt -m l2c 'five creative names for a pet hedgehog'
Error:

llm prompt -o verbose -m l2c 'five creative names for a pet hedgehog'
Usage: llm prompt [OPTIONS] [PROMPT]
Try 'llm prompt --help' for help.

Error: Got unexpected extra argument (five creative names for a pet hedgehog)

llm prompt -o verbose=true -m l2c 'five creative names for a pet hedgehog'
Usage: llm prompt [OPTIONS] [PROMPT]
Try 'llm prompt --help' for help.

Error: Got unexpected extra argument (five creative names for a pet hedgehog)

https://github.com/simonw/llm-llama-cpp/issues/10#issuecomment-1701686948

michitux commented 1 year ago

The root cause of this is that llama.cpp switched to a new file format, and you need to download GGUF models for it to work. The pre-compiled wheel still supports the old file format, that's why it works.

To get this to work, you need to change .bin to .gguf in these two lines and then download a GGUF model file. For example

llm llama-cpp download-model \
  'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf'\
  -a llama2-chat-13b-gguf --llama2-chat

gives me a working model to try with llm -m llama2-chat-13b-gguf "What is the capital of France?".

vividfog commented 1 year ago

The root cause of this is that llama.cpp switched to a new file format, and you need to download GGUF models for it to work. The pre-compiled wheel still supports the old file format, that's why it works.

To get this to work, you need to change .bin to .gguf in these two lines and then download a GGUF model file. For example

llm llama-cpp download-model \
  'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf'\
  -a llama2-chat-13b-gguf --llama2-chat

gives me a working model to try with llm -m llama2-chat-13b-gguf "What is the capital of France?".

I can confirm that changing those two lines (.bin → .gguf) in llm_llama_cpp.py is enough to fix the situation. Then, installing the modified plugin. No wheels needed. Tested with M1 Max, Python 3.11, llm 0.9, llama-cpp-python 0.1.83.

programmylife commented 1 year ago

The root cause of this is that llama.cpp switched to a new file format, and you need to download GGUF models for it to work. The pre-compiled wheel still supports the old file format, that's why it works.

To get this to work, you need to change .bin to .gguf in these two lines and then download a GGUF model file. For example

llm llama-cpp download-model \
  'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf'\
  -a llama2-chat-13b-gguf --llama2-chat

gives me a working model to try with llm -m llama2-chat-13b-gguf "What is the capital of France?".

Also worked for me. For anyone less familiar with Python packaging: edit llm_llama_cpp.py in .venv/lib/python3.11/site-packages. I was worried I might need to install as editable, but just editing the file worked. I now have this working on Windows (WSL).

Would this be a good candidate for a PR @michitux ? I'd assume the work would be updating that script to include both .bin and .gguf in case someone wants to use an older model then updating the README to clarify the two model types. Anything else? If not, I can open that PR.

michitux commented 1 year ago

Would this be a good candidate for a PR @michitux ? I'd assume the work would be updating that script to include both .bin and .gguf in case someone wants to use an older model then updating the README to clarify the two model types. Anything else? If not, I can open that PR.

The main work is updating and verifying the examples in the README, otherwise I would have opened a PR myself. Of course, updating the script is also part of it but I think that's the easy part.

But yes, if you have the time and motivation, please open a PR. Also, I think the examples use some older quantization variants, they could be updated to use the more recent variants.