mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.74k stars 1.81k forks source link

'response_format' field in OpenAI image creation request does not match OpenAI API spec #1910

Closed Ephex2 closed 2 months ago

Ephex2 commented 6 months ago

LocalAI version: v2.11.0-aio-cpu

Environment, CPU architecture, OS, and Version: OS: Linux myBox 5.15.0-101-generic #111-Ubuntu SMP Tue Mar 5 20:16:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux Docker version: 26.0.0 CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80 GHz

Describe the bug response_format is not properly typed in image OpenAI requests in LocalAI

The OpenAI API spec says that the response_format property should be a string:

response_format - string or null / Optional / Defaults to url

The format in which the generated images are returned. Must be one of url or b64_json. URLs are only valid for 60 minutes after the image has been generated.

However, the type in the LocalAI repo seems to be a struct with a Type property, which would be a string. This is defined in openai.go at lines 102-106 in commit 801b481


To Reproduce

instead of providing a body like:

{
    ...
    "response_format": "url",
    ...
}

Which is supported by OpenAI, we must provide a response object of the type:

{
    ...
    "response_format": {"type": "url"},
    ...
}

This breaks image creation calls using client models for OpenAI. Example error:

curl http://localhost:8080/v1/images/generations  -H "Content-Type: application/json" -d '{
    "prompt": "A cute baby sea otter",
    "model": "stablediffusion",
    "n":1,
    "response_format": "url",
    "size": "256x256",
    "user": "go-gpt-cli"
}'
{"error":{"code":500,"message":"failed reading parameters from request:failed parsing request body: json: cannot unmarshal string into Go struct field OpenAIRequest.response_format of type schema.ChatCompletionResponseFormat","type":""}}%

As shown below, the modification to the response_format works locally, but the same request would not work with openAI:

curl http://localhost:8080/v1/images/generations  -H "Content-Type: application/json" -d '{
    "prompt": "A cute baby sea otter",
    "model": "stablediffusion",
    "n":1,
    "response_format": {"type": "url"},
    "size": "256x256",
    "user": "go-gpt-cli"
}'
{"created":1711557449,"id":"a6b91e5d-f117-462e-b2b6-d12bba52a2b5","data":[{"embedding":null,"index":0,"url":"http://localhost:8080/generated-images/b64449955181.png"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Here is a sample of the above request failing when being sent to OpenAI, and a subsequent success when the response_format is modified (model changed to dall-e-2):

curl https://api.openai.com/v1/images/generations  -H "Content-Type: application/json" -H 'Authorization: Bearer $SECRET' -d '{
    "prompt": "A cute baby sea otter",
    "model": "dall-e-2",
    "n":1,
    "response_format": {"type": "url"},
    "size": "256x256",
    "user": "test"
}'
{
  "error": {
    "code": null,
    "message": "{'type': 'url'} is not of type 'string' - 'response_format'",
    "param": null,
    "type": "invalid_request_error"
  }
}


Expected behavior

When calling OpenAI's API, modifying the response_format to be a string with value "url", we see that the request works:

curl https://api.openai.com/v1/images/generations  -H "Content-Type: application/json" -H 'Authorization: Bearer $SECRET' -d '{
    "prompt": "A cute baby sea otter",
    "model": "dall-e-2",
    "n":1,
    "response_format": "url",            
    "size": "256x256",
    "user": "test"
}'
{
  "created": 1711557884,
  "data": [
    {
      "url": "https://something.blob.core.windows.net/private/...redacted..."
    }
  ]
}

Ideally, this same behavior would be achieved with LocalAI's API models.


Additional context I don't know if this would impact existing users of LocalAI, or the backend, but I believe the struct should be modified to match the OpenAI API specifications, e.g., that the ResponseFormat field in the OpenAIRequest type be modified to be of type string.

Ideally, the omitempty tag would be added to the field as well.

mudler commented 6 months ago

I can confirm that the current implementation of response_format is overlapping with the ChatCompletion one, which differs from the image endpoint - it needs to be extended to support the image endpoint response_format as well and be slightly more generic.

localai-bot commented 5 months ago

Thank you for reporting this issue, Ephex2. I can confirm that the current implementation of the response_format field in the OpenAI request does cause compatibility issues with the OpenAI API specification for image generation.

To address this, we will need to update the openai.go file in the LocalAI repository to align with the OpenAI API specification for image creation requests. Specifically, we should update the OpenAIRequest struct to support the response_format field as a string, with the omitempty tag added to the field as you suggested.

Once this update is made, the issue should be resolved, and the LocalAI API models should work as expected with the OpenAI API when making image creation calls.

I'll create a pull request to address this issue and mention this discussion in the PR description for reference.

Thanks again for bringing this to our attention! If you have any further questions or concerns, please don't hesitate to reach out.

mudler commented 2 months ago

Done in https://github.com/mudler/LocalAI/pull/2347