Support OpenAI and Ollama

shavit commented 9 months ago

This change extends previous work on remote models, and adds OpenAI compatible backend #59

Tasks and discussions:

[x] Use the same completion interface for all backends
[x] Upgrade EventStream from 0.0.5, because it can crash if users misconfigure their server: https://github.com/Recouse/EventSource/blob/8c0af68bf3a4c93819d3fa5f549232f612324de2/Sources/EventSource/ServerMessage.swift#L55-L57
[x] Use a stream instead of callbacks for completion
[x] Settings interface, can have preconfigured settings for OpenAI, Ollama, yet need to have the server type because of their different design.
[x] Currently in this branch the app only works with Ollama, minimally for demonstration.

Ideally the change will not affect what's already working right now with Llama, and have the minimum necessary change. Upgrades or refactoring can be added at the end.

vercel[bot] commented 9 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
free-chat	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 25, 2024 7:43pm

prabirshrestha commented 9 months ago

For ollama would be good to support keep_alive parameter so we can control for how long the model will be loaded.

shavit commented 9 months ago

The remote server backends will need API key field, to be added as authorization header, and a selected model name from a separate list.

Since the model ID is used to select model names but also the remote model, the settings need another option to choose a backend type. Then the model ID can be used for remote backends.

psugihara commented 8 months ago

This is cool, great work! Would it make sense to go more general and migrate OllamaBackend -> OpenAIBackend?

Now that there's template support in llama.cpp server, we could migrate the default LlamaServer logic to llama.cpp server's openAI API and hopefully share all of the code.

shavit commented 8 months ago

They are similar but not the same:

In https://github.com/psugihara/FreeChat/pull/60/commits/543303a37c680f1d5434ff485019ed5f0e717114 the parameters are similar to llama.cpp and OpenAI, except they have options nested (see https://github.com/ollama/ollama/blob/main/docs/api.md and https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values).
OpenAI and Ollama accept model name parameters, but their values are different.
Their API could change and break other backends.

The Settings and Agent is where all the backends share the same behavior. An interface of chat completion can have context, user messages, and maybe options for the temperature etc. that can be shared across all backends.

psugihara commented 8 months ago

Would their openai /v1/chat/completions endpoint give what we need?

https://github.com/ollama/ollama/blob/main/docs/openai.md#endpoints

shavit commented 8 months ago

Yes, I don't remember why I used the other endpoint.

shavit commented 8 months ago

There are few more changes to make, such as backend initialization to ensure it is not nil, and solve the conflicts. Currently the local version of llama.cpp doesn't work, but it could be outdated.

Other notes:

Health check only check for 200 instead of the previous response, which is misleading if it hits other services on the host.
There are two model lists on the settings view.
The llama.cpp backend does not have a model list.

psugihara commented 8 months ago

Just played with this, very cool. I like the general approach of allowing you to switch backends (and having the 0-config localhost backend by default).

Try merging main for a recent version of llama.cpp (I updated it friday).

A few other thoughts...

Since this is used for multiple backends, switch copy to "Configure your backend based on the model you're using"

For max simplicity, maybe we just get rid of the llama.cpp backend list option since (I think?) you can use llama.cpp via OpenAI API. It's slightly confusing to me just bc "This Computer" is also llama.cpp.

It would be nice to have a sentence or 2 of copy pointing you to where to start with each backend and/or what it is. This is not necessary to merge though, I can enhance later if you don't have copy ideas.

shavit commented 8 months ago

Notice that the context parameter only applies to Ollama and the embedded server.
The copy can move up below the backend instead the model.
I agree that it can be confusing if users will configure it to run against the embedded server, but it is still an option for users to run remotely.
I added short descriptions

Also the prompt is ignored now.

psugihara commented 8 months ago

Just had some time to test and found a fatal bug when I send a message after switching to the default backend. Not quite sure what's going on.

shavit commented 8 months ago

Yes, the backend was an implicitly unwrapped optional to find those errors, rather than silence them and not respond at all. Now the backend is being initialized together with the agent, and uses the default local server.

Maybe the initialization parameters of the agent and error handling can be improved.

psugihara / FreeChat

Support OpenAI and Ollama #60