srcnalt / OpenAI-Unity

An unofficial OpenAI Unity Package that aims to help you use OpenAI API directly in Unity Game engine.
MIT License
683 stars 153 forks source link

Issue with Sending Image URL to GPT-4o in Unity #129

Open Zaf01 opened 1 month ago

Zaf01 commented 1 month ago

Hi,

I am trying to implement the gpt 4o vision capabilities in Unity using this package. I am trying to send an image URL to the model in the following manner :

 public async void SendImageUrlToGPT4(string imageurl)
    {
        var userMessage = new ChatMessage
        {
            Role = "user",
            Content = "[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"" + imageurl + "\"}]"
        };

        messages.Add(userMessage);

        var request = new CreateChatCompletionRequest
        {
            Messages = messages,
            Model = "gpt-4o",
            MaxTokens = 300
        };

        var response = await openAI.CreateChatCompletion(request);

        if (response.Choices != null && response.Choices.Count > 0)
        {
            var chatResponse = response.Choices[0].Message;

            Debug.Log(chatResponse.Content);

            OnResponse.Invoke(chatResponse.Content);

            Debug.Log("Response Finished");
        }
        else
        {
            Debug.LogError("No response from GPT-4 Vision.");
        }
    }

However, the model always gives a response with incorrect descriptions which perhaps could be because there is some issue with the way the request is being sent to the model in Unity?

When I tried passing the same URL in the python code snippet provided by OpenAI, the model describes the image accurately. Here is the python code that I tested:


import openai
import json

# Set your API key
openai.api_key = ""

response = openai.ChatCompletion.create(
    model="gpt-4o",
   messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129",
          },
        },
      ],
    }
  ],
    max_tokens=300,
)

print(response['choices'][0]['message']['content'])

Here is the JSON dump of the request payload in Unity:

{"Role":"user","Content":"[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words.\"}, {\"type\": \"image_url\", \"url\": \"https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129\"}]"}

The JSON dump in Python:

{
      "model": "gpt-4o",
      "messages": [
            {
                  "role": "user",
                  "content": [
                        {
                              "type": "text",
                              "text": "What\u2019s in this image?"
                        },
                        {
                              "type": "image_url",
                              "image_url": {
                                    "url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129"
                              }
                        }
                  ]
            }
      ],
      "max_tokens": 300
}

Can you please let me know how can I correctly send the image to the model and get the correct response with the image description using this package? I am not sure what is causing this issue. Any insights on this would be greatly appreciated.