psugihara / FreeChat

llama.cpp based AI chat app for macOS
https://www.freechat.run
MIT License
425 stars 37 forks source link

Add support for OpenAI API #59

Open prabirshrestha opened 7 months ago

prabirshrestha commented 7 months ago

Would be great if we can configure to add an OpenAI API.

Ollama recently added support for it. https://ollama.com/blog/openai-compatibility EdgeAI supports OpenAI API. https://edgen.co

psugihara commented 7 months ago

Thanks for the request and for checking out freechat. Can you elaborate a bit on your use-case? You want to use freechat as an interface for ollama or chatgpt?

prabirshrestha commented 7 months ago

It wouldn't matter if Ollama or ChatGPT is supported since both internally uses the OpenAI API. Supporting one will support both. https://platform.openai.com/docs/api-reference/chat/create

Currently when using "Add or Remove Models" and clicking "+" it opens a file picker to select a model. I'm hoping there is a way to "Add Open AI Models" and configure appropriate settings.

Here is an example calling Ollama using openai compatible api.

export OPENAI_API_URL=http://localhost:11434/v1
export OPENAI_API_KEY=ollama
curl $OPENAI_API_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "llama2",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The LA Dodgers won in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
  }'

Response:

{
  "id": "chatcmpl-66",
  "object": "chat.completion",
  "created": 1707970399,
  "model": "llama2",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The 2020 World Series was played at various locations, including the home stadiums of the competing teams. The Los Angeles Dodgers played their home games at Dodger Stadium in Los Angeles, California, and the Tampa Bay Rays played their home games at Tropicana Field in St. Petersburg, Florida."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 69,
    "total_tokens": 69
  }
}

If you want to call OpenAI API.

export OPENAI_API_URL=https://api.openai.com/v1
export OPENAI_API_KEY=....
# change the "model" to "gpt-3.5-turbo"

As for the use case everyone in my family might not have a powerful machine to run the models. I would like to run an Ollama server or other openai compatible servers in a different powerful server so that any machine in my home network can reuse the server.

Seems like free-chat is already running llama.cpp/examples/server but is using /completion api. I suggest migrating to /v1/chat/completion endpoint mentioned in their readme instead which is already supported by llama.cpp, that way local models, Ollama will all use the OpenAI API.

psugihara commented 7 months ago

OK got it, thanks for the detailed writeup!

I support adding that functionality and I think it would make sense to extend @shavit's work here with the "remote model" option https://github.com/psugihara/FreeChat/pull/50

There is one blocker I see currently. I agree that we only want to program against 1 API (right not that is /completion) but AFAIK the /v1/chat/completion endpoint only supports 2 prompt templates compared to the 5 we currently support.

There is some discussion of how to support different templates here but they have not reached consensus.

psugihara commented 7 months ago

Looks like templating is coming along. Let's update this issue as support becomes available from llama.cpp https://github.com/ggerganov/llama.cpp/pull/5538

shavit commented 7 months ago

I can work on it after their merge.