Open shavit opened 9 months ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
free-chat | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Mar 25, 2024 7:43pm |
For ollama would be good to support keep_alive
parameter so we can control for how long the model will be loaded.
The remote server backends will need API key field, to be added as authorization header, and a selected model name from a separate list.
Since the model ID is used to select model names but also the remote model, the settings need another option to choose a backend type. Then the model ID can be used for remote backends.
This is cool, great work! Would it make sense to go more general and migrate OllamaBackend -> OpenAIBackend?
Now that there's template support in llama.cpp server, we could migrate the default LlamaServer logic to llama.cpp server's openAI API and hopefully share all of the code.
They are similar but not the same:
options
nested (see https://github.com/ollama/ollama/blob/main/docs/api.md and https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values). The Settings and Agent is where all the backends share the same behavior. An interface of chat completion can have context, user messages, and maybe options for the temperature etc. that can be shared across all backends.
Would their openai /v1/chat/completions endpoint give what we need?
https://github.com/ollama/ollama/blob/main/docs/openai.md#endpoints
Yes, I don't remember why I used the other endpoint.
There are few more changes to make, such as backend initialization to ensure it is not nil, and solve the conflicts. Currently the local version of llama.cpp doesn't work, but it could be outdated.
Other notes:
Related https://github.com/psugihara/FreeChat/issues/51 Related https://github.com/psugihara/FreeChat/issues/26 Related Closes #59
Just played with this, very cool. I like the general approach of allowing you to switch backends (and having the 0-config localhost backend by default).
Try merging main
for a recent version of llama.cpp (I updated it friday).
A few other thoughts...
Since this is used for multiple backends, switch copy to "Configure your backend based on the model you're using"
Also the prompt is ignored now.
Just had some time to test and found a fatal bug when I send a message after switching to the default backend. Not quite sure what's going on.
Yes, the backend was an implicitly unwrapped optional to find those errors, rather than silence them and not respond at all. Now the backend is being initialized together with the agent, and uses the default local server.
Maybe the initialization parameters of the agent and error handling can be improved.
This change extends previous work on remote models, and adds OpenAI compatible backend #59
Tasks and discussions:
EventStream
from0.0.5
, because it can crash if users misconfigure their server: https://github.com/Recouse/EventSource/blob/8c0af68bf3a4c93819d3fa5f549232f612324de2/Sources/EventSource/ServerMessage.swift#L55-L57Ideally the change will not affect what's already working right now with Llama, and have the minimum necessary change. Upgrades or refactoring can be added at the end.