simonw / llm-llama-cpp

LLM plugin for running models using llama.cpp
Apache License 2.0
136 stars 19 forks source link

ggud model type doesn't appear in llm models list until you've added a model with llama-cpp add-model #27

Open wtfrank opened 8 months ago

wtfrank commented 8 months ago

I tried #26 and gguf model type didn't get picked up by llm until I registered a model with "llm llama-cpp add-model". I'm not sure if this is working as intended - I expected that gguf would appear in the list then I could use an option to specify the model file. But I could have misunderstood the notes.

$ pip list | grep llm
llm                      0.12
llm-gpt4all              0.2
llm-llama-cpp            0.3

I expected gguf to appear in this list but didn't see a gguf entry

$ llm models
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-1106-preview (aliases: gpt-4-turbo, 4-turbo, 4t)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
gpt4all: nous-hermes-llama2-13b - Hermes, 6.86GB download, needs 16GB RAM (installed)
gpt4all: all-MiniLM-L6-v2-f16 - SBert, 43.76MB download, needs 1GB RAM
gpt4all: replit-code-v1_5-3b-q4_0 - Replit, 1.74GB download, needs 4GB RAM
gpt4all: orca-mini-3b-gguf2-q4_0 - Mini Orca (Small), 1.84GB download, needs 4GB RAM
gpt4all: mpt-7b-chat-merges-q4_0 - MPT Chat, 3.54GB download, needs 8GB RAM
gpt4all: orca-2-7b - Orca 2 (Medium), 3.56GB download, needs 8GB RAM
gpt4all: rift-coder-v0-7b-q4_0 - Rift coder, 3.56GB download, needs 8GB RAM
gpt4all: em_german_mistral_v01 - EM German Mistral, 3.83GB download, needs 8GB RAM
gpt4all: mistral-7b-instruct-v0 - Mistral Instruct, 3.83GB download, needs 8GB RAM
gpt4all: mistral-7b-openorca - Mistral OpenOrca, 3.83GB download, needs 8GB RAM
gpt4all: gpt4all-falcon-q4_0 - GPT4All Falcon, 3.92GB download, needs 8GB RAM
gpt4all: gpt4all-13b-snoozy-q4_0 - Snoozy, 6.86GB download, needs 16GB RAM
gpt4all: wizardlm-13b-v1 - Wizard v1.2, 6.86GB download, needs 16GB RAM
gpt4all: orca-2-13b - Orca 2 (Full), 6.86GB download, needs 16GB RAM
gpt4all: starcoder-q4_0 - Starcoder, 8.37GB download, needs 4GB RAM

llm is aware of the plugin

$ llm plugins
[
  {
    "name": "llm-gpt4all",
    "hooks": [
      "register_models"
    ],
    "version": "0.2"
  },
  {
    "name": "llm-llama-cpp",
    "hooks": [
      "register_commands",
      "register_models"
    ],
    "version": "0.3"
  }
]

I then read the code and looked into the llama-cpp command and explicitly added a model I had earlier downloaded. At this point both that model and gguf appeared in the llm models list.

$ llm llama-cpp add-model models/una-cybertron-7b-v2-bf16.Q8_0.gguf 
$ llm models
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-1106-preview (aliases: gpt-4-turbo, 4-turbo, 4t)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
LlamaModel: una-cybertron-7b-v2-bf16.Q8_0
LlamaGGUF: gguf

I wonder if register(LlamaGGUF()) in register_commands in llm_llama_cpp.py should appear before the check for models.json

wtfrank commented 8 months ago

Ok what's happening is that while models.json doesn't exist, llm models doesn't show gguf. once models.json is present - even if empty - llm models shows gguf.

llm llama-cpp models seems to create the empty models.json, and after that llm models shows the expected gguf.

ff@calloway:~/.config/io.datasette.llm/llama-cpp$ rm models.json
ff@calloway:~/.config/io.datasette.llm/llama-cpp$ llm models
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-1106-preview (aliases: gpt-4-turbo, 4-turbo, 4t)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
gpt4all: nous-hermes-llama2-13b - Hermes, 6.86GB download, needs 16GB RAM (installed)
gpt4all: all-MiniLM-L6-v2-f16 - SBert, 43.76MB download, needs 1GB RAM
gpt4all: replit-code-v1_5-3b-q4_0 - Replit, 1.74GB download, needs 4GB RAM
gpt4all: orca-mini-3b-gguf2-q4_0 - Mini Orca (Small), 1.84GB download, needs 4GB RAM
gpt4all: mpt-7b-chat-merges-q4_0 - MPT Chat, 3.54GB download, needs 8GB RAM
gpt4all: orca-2-7b - Orca 2 (Medium), 3.56GB download, needs 8GB RAM
gpt4all: rift-coder-v0-7b-q4_0 - Rift coder, 3.56GB download, needs 8GB RAM
gpt4all: em_german_mistral_v01 - EM German Mistral, 3.83GB download, needs 8GB RAM
gpt4all: mistral-7b-instruct-v0 - Mistral Instruct, 3.83GB download, needs 8GB RAM
gpt4all: mistral-7b-openorca - Mistral OpenOrca, 3.83GB download, needs 8GB RAM
gpt4all: gpt4all-falcon-q4_0 - GPT4All Falcon, 3.92GB download, needs 8GB RAM
gpt4all: gpt4all-13b-snoozy-q4_0 - Snoozy, 6.86GB download, needs 16GB RAM
gpt4all: wizardlm-13b-v1 - Wizard v1.2, 6.86GB download, needs 16GB RAM
gpt4all: orca-2-13b - Orca 2 (Full), 6.86GB download, needs 16GB RAM
gpt4all: starcoder-q4_0 - Starcoder, 8.37GB download, needs 4GB RAM
ff@calloway:~/.config/io.datasette.llm/llama-cpp$ llm llama-cpp models
{}
ff@calloway:~/.config/io.datasette.llm/llama-cpp$ llm models
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-1106-preview (aliases: gpt-4-turbo, 4-turbo, 4t)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
LlamaGGUF: gguf
gpt4all: nous-hermes-llama2-13b - Hermes, 6.86GB download, needs 16GB RAM (installed)
gpt4all: all-MiniLM-L6-v2-f16 - SBert, 43.76MB download, needs 1GB RAM
gpt4all: replit-code-v1_5-3b-q4_0 - Replit, 1.74GB download, needs 4GB RAM
gpt4all: orca-mini-3b-gguf2-q4_0 - Mini Orca (Small), 1.84GB download, needs 4GB RAM
gpt4all: mpt-7b-chat-merges-q4_0 - MPT Chat, 3.54GB download, needs 8GB RAM
gpt4all: orca-2-7b - Orca 2 (Medium), 3.56GB download, needs 8GB RAM
gpt4all: rift-coder-v0-7b-q4_0 - Rift coder, 3.56GB download, needs 8GB RAM
gpt4all: em_german_mistral_v01 - EM German Mistral, 3.83GB download, needs 8GB RAM
gpt4all: mistral-7b-instruct-v0 - Mistral Instruct, 3.83GB download, needs 8GB RAM
gpt4all: mistral-7b-openorca - Mistral OpenOrca, 3.83GB download, needs 8GB RAM
gpt4all: gpt4all-falcon-q4_0 - GPT4All Falcon, 3.92GB download, needs 8GB RAM
gpt4all: gpt4all-13b-snoozy-q4_0 - Snoozy, 6.86GB download, needs 16GB RAM
gpt4all: wizardlm-13b-v1 - Wizard v1.2, 6.86GB download, needs 16GB RAM
gpt4all: orca-2-13b - Orca 2 (Full), 6.86GB download, needs 16GB RAM
gpt4all: starcoder-q4_0 - Starcoder, 8.37GB download, needs 4GB RAM
barnabywalters commented 4 months ago

I ran into this today as well. Following the instructions at https://simonwillison.net/2023/Dec/18/mistral/ from scratch (i.e. with no models.json file already present) still results in the 'gguf is not a valid model' error until you run llm llama-cpp models. If there are no immediate plans to fix this bug, it might be worth suggesting running that command in the blog post and the readme so that people can avoid this issue.