Closed tmm1 closed 1 year ago
This almost works already - the code you linked to here isn't the code that talks to the language model to activate prompts, it's just the code that powers the llm openai models
command.
LLM talks to OpenAI directly like this: https://github.com/simonw/llm/blob/3f1388a4e6c8b951bb7e2f8a5ac6b6e08e99b873/llm/default_plugins/openai_models.py#L173-L194
Since it's using the openai.ChatCompletion
library directly, you should be able to point it at other endpoint URLs by setting an environment variable:
export OPENAI_API_BASE='http://localhost:8080/'
But... that's not going to get you all of the way there, because like you pointed out you need to be able to specify a different model name.
The current official way of solving that is to write a plugin, as detailed here: https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html
So a llm-localai
plugin could be one way forward here.
Bit it would be nice if you could use the existing OpenAI plugin to access other OpenAI-compatible models.
The challenge is how best to design that feature. One option would be to use the existing options mechanism:
llm -m chatgpt "Say hello" -o apI_base "http://localhost:8080/" -o custom_model "name-of-model"
But I don't like that, because it would result in all of those other models being logged in the same place as gpt-3.5-turbo
completions.
Really we want to be able to define new models - llm -m NAME
- which under the hood use the existing OpenAI plugin code but with those extra settings.
I'll have a think about ways that might work.
I got LocalAI working on my Mac:
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
cp ~/.cache/gpt4all/orca-mini-3b.ggmlv3.q4_0.bin models/orca-mini-3b.ggmlv3
cp prompt-templates/alpaca.tmpl models/orca-mini-3b.ggmlv3.tmpl
docker-compose up -d --pull always
At this point it didn't seem to work. It turned out it has a LOT of things it needs to do on first launch before the web server becomes ready - running GCC a bunch of times etc.
I ran this to find it's process ID:
docker ps
Then repeatedly ran this to see how far it had got:
docker logs c0041c248973
Eventually it started properly and this worked:
curl http://localhost:8080/v1/models | jq
{
"object": "list",
"data": [
{
"id": "ggml-gpt4all-j",
"object": "model"
},
{
"id": "orca-mini-3b.ggmlv3",
"object": "model"
}
]
}
And this:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "orca-mini-3b.ggmlv3",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}' | jq
{
"object": "chat.completion",
"model": "orca-mini-3b.ggmlv3",
"choices": [
{
"message": {
"role": "assistant",
"content": " No, this is not a test!"
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
projects such as LocalAI offer an openai compatible web API
https://github.com/go-skynet/LocalAI
maybe the hardcoded api endpoint can be parameterized using a new environment variable?
https://github.com/simonw/llm/blob/3f1388a4e6c8b951bb7e2f8a5ac6b6e08e99b873/llm/default_plugins/openai_models.py#L36-L36
for example, see this post about using the app ChatWizard as a front-end to LocalAI hosted models: https://www.reddit.com/r/LocalLLaMA/comments/14w2767/recommendation_an_ingenious_frontend_localai/