simonw / llm

Access large language models from the command-line
https://llm.datasette.io
Apache License 2.0
4.95k stars 278 forks source link

Support openai compatible APIs #106

Closed tmm1 closed 1 year ago

tmm1 commented 1 year ago

projects such as LocalAI offer an openai compatible web API

https://github.com/go-skynet/LocalAI

maybe the hardcoded api endpoint can be parameterized using a new environment variable?

https://github.com/simonw/llm/blob/3f1388a4e6c8b951bb7e2f8a5ac6b6e08e99b873/llm/default_plugins/openai_models.py#L36-L36

for example, see this post about using the app ChatWizard as a front-end to LocalAI hosted models: https://www.reddit.com/r/LocalLLaMA/comments/14w2767/recommendation_an_ingenious_frontend_localai/

tmm1 commented 1 year ago

another option is https://github.com/lhenault/simpleAI

h/t https://github.com/paul-gauthier/aider/blob/743d3f0d1c6a301cdd74cb5f22d5bcddc6535bef/docs/faq.md?plain=1#L85-L98

simonw commented 1 year ago

This almost works already - the code you linked to here isn't the code that talks to the language model to activate prompts, it's just the code that powers the llm openai models command.

LLM talks to OpenAI directly like this: https://github.com/simonw/llm/blob/3f1388a4e6c8b951bb7e2f8a5ac6b6e08e99b873/llm/default_plugins/openai_models.py#L173-L194

Since it's using the openai.ChatCompletion library directly, you should be able to point it at other endpoint URLs by setting an environment variable:

export OPENAI_API_BASE='http://localhost:8080/'

But... that's not going to get you all of the way there, because like you pointed out you need to be able to specify a different model name.

The current official way of solving that is to write a plugin, as detailed here: https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html

So a llm-localai plugin could be one way forward here.

Bit it would be nice if you could use the existing OpenAI plugin to access other OpenAI-compatible models.

The challenge is how best to design that feature. One option would be to use the existing options mechanism:

llm -m chatgpt "Say hello" -o apI_base "http://localhost:8080/" -o custom_model "name-of-model"

But I don't like that, because it would result in all of those other models being logged in the same place as gpt-3.5-turbo completions.

Really we want to be able to define new models - llm -m NAME - which under the hood use the existing OpenAI plugin code but with those extra settings.

I'll have a think about ways that might work.

simonw commented 1 year ago

I got LocalAI working on my Mac:

git clone https://github.com/go-skynet/LocalAI
cd LocalAI
cp ~/.cache/gpt4all/orca-mini-3b.ggmlv3.q4_0.bin models/orca-mini-3b.ggmlv3
cp prompt-templates/alpaca.tmpl models/orca-mini-3b.ggmlv3.tmpl
docker-compose up -d --pull always

At this point it didn't seem to work. It turned out it has a LOT of things it needs to do on first launch before the web server becomes ready - running GCC a bunch of times etc.

I ran this to find it's process ID:

docker ps

Then repeatedly ran this to see how far it had got:

docker logs c0041c248973

Eventually it started properly and this worked:

curl http://localhost:8080/v1/models | jq
{
  "object": "list",
  "data": [
    {
      "id": "ggml-gpt4all-j",
      "object": "model"
    },
    {
      "id": "orca-mini-3b.ggmlv3",
      "object": "model"
    }
  ]
}

And this:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "orca-mini-3b.ggmlv3",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }' | jq
{
  "object": "chat.completion",
  "model": "orca-mini-3b.ggmlv3",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": " No, this is not a test!"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}
simonw commented 1 year ago

Documentation: https://llm.datasette.io/en/latest/other-models.html#openai-compatible-models