psugihara / FreeChat

llama.cpp based AI chat app for macOS
https://www.freechat.run
MIT License
402 stars 34 forks source link

Support OpenAI and Ollama #60

Open shavit opened 4 months ago

shavit commented 4 months ago

This change extends previous work on remote models, and adds OpenAI compatible backend #59

Tasks and discussions:

Ideally the change will not affect what's already working right now with Llama, and have the minimum necessary change. Upgrades or refactoring can be added at the end.

vercel[bot] commented 4 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
free-chat ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 25, 2024 7:43pm
prabirshrestha commented 4 months ago

For ollama would be good to support keep_alive parameter so we can control for how long the model will be loaded.

shavit commented 4 months ago

The remote server backends will need API key field, to be added as authorization header, and a selected model name from a separate list.

Since the model ID is used to select model names but also the remote model, the settings need another option to choose a backend type. Then the model ID can be used for remote backends.

psugihara commented 4 months ago

This is cool, great work! Would it make sense to go more general and migrate OllamaBackend -> OpenAIBackend?

Now that there's template support in llama.cpp server, we could migrate the default LlamaServer logic to llama.cpp server's openAI API and hopefully share all of the code.

shavit commented 4 months ago

They are similar but not the same:

The Settings and Agent is where all the backends share the same behavior. An interface of chat completion can have context, user messages, and maybe options for the temperature etc. that can be shared across all backends.

psugihara commented 4 months ago

Would their openai /v1/chat/completions endpoint give what we need?

https://github.com/ollama/ollama/blob/main/docs/openai.md#endpoints

shavit commented 4 months ago

Yes, I don't remember why I used the other endpoint.

shavit commented 3 months ago

There are few more changes to make, such as backend initialization to ensure it is not nil, and solve the conflicts. Currently the local version of llama.cpp doesn't work, but it could be outdated.

Other notes:

Related https://github.com/psugihara/FreeChat/issues/51 Related https://github.com/psugihara/FreeChat/issues/26 Related Closes #59

psugihara commented 3 months ago

Just played with this, very cool. I like the general approach of allowing you to switch backends (and having the 0-config localhost backend by default).

Try merging main for a recent version of llama.cpp (I updated it friday).

A few other thoughts...

  1. Screenshot 2024-03-11 at 9 26 19 AM

Since this is used for multiple backends, switch copy to "Configure your backend based on the model you're using"


  1. Screenshot 2024-03-11 at 9 26 12 AM

  1. For max simplicity, maybe we just get rid of the llama.cpp backend list option since (I think?) you can use llama.cpp via OpenAI API. It's slightly confusing to me just bc "This Computer" is also llama.cpp.

  1. It would be nice to have a sentence or 2 of copy pointing you to where to start with each backend and/or what it is. This is not necessary to merge though, I can enhance later if you don't have copy ideas.
shavit commented 3 months ago
  1. Notice that the context parameter only applies to Ollama and the embedded server.
  2. The copy can move up below the backend instead the model.
  3. I agree that it can be confusing if users will configure it to run against the embedded server, but it is still an option for users to run remotely.
  4. I added short descriptions

Also the prompt is ignored now.

psugihara commented 3 months ago

Just had some time to test and found a fatal bug when I send a message after switching to the default backend. Not quite sure what's going on.

Screenshot 2024-03-24 at 10 05 40 PM Screenshot 2024-03-24 at 10 05 20 PM
shavit commented 3 months ago

Yes, the backend was an implicitly unwrapped optional to find those errors, rather than silence them and not respond at all. Now the backend is being initialized together with the agent, and uses the default local server.

Maybe the initialization parameters of the agent and error handling can be improved.