ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
https://ollama.com
MIT License
93.99k stars 7.43k forks source link

The response data is not compatiable with OpenAI API. #7236

Closed tobegit3hub closed 3 days ago

tobegit3hub commented 3 days ago

Refer to the API docs in https://github.com/ollama/ollama/blob/main/docs/api.md , currently the response data format is not compatible with OpenAI API.

image

It is import to be compatible with OpenAI API for not only the request data but also the response data. Is there any plan to make changes for response data?

rick-github commented 3 days ago

Use the OpenAI compatibility endpoint.

tobegit3hub commented 3 days ago

Use the OpenAI compatibility endpoint.

Thanks for replying. Actually we are using this compatibility endpoint. But the response is not compatible with official OpenAI API which is also supported by vllm. Here is the expected response data. Refer to https://platform.openai.com/docs/api-reference/streaming .

image
rick-github commented 3 days ago

What is incompatible about the ollama response?

tobegit3hub commented 3 days ago

What is incompatible about the ollama response?

The expected response should be like this.

{
    "id": "chatcmpl-abc123",
    "object": "chat.completion",
    "created": 1677858242,
    "model": "gpt-4o-mini",
    "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 7,
        "total_tokens": 20,
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "\n\nThis is a test!"
            },
            "logprobs": null,
            "finish_reason": "stop",
            "index": 0
        }
    ]
}

But the actual response from ollama is like this.

{
  "model": "codellama:code",
  "created_at": "2024-07-22T20:47:51.147561Z",
  "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",
  "done": true,
  "done_reason": "stop",
  "context": [...],
  "total_duration": 1162761250,
  "load_duration": 6683708,
  "prompt_eval_count": 17,
  "prompt_eval_duration": 201222000,
  "eval_count": 63,
  "eval_duration": 953997000
}

The streamming mode has the same issue as well.

Our applicaitons will use public model service which is completely compatiable with OpenAI API and ollama model services. The difference of their response require extra work for applications to handle.

rick-github commented 3 days ago

That response is from the ollama API endpoint, /api/generate. Use the OpenAI compatibility endpoint, /v1/chat/completions.

tobegit3hub commented 3 days ago

That response is from the ollama API endpoint, /api/generate. Use the OpenAI compatibility endpoint, /v1/chat/completions.

Thanks and you are right. We are using the incoreect API and the /v1/chat/completions works like a charm. Thanks and I will close this issue.