jekalmin / extended_openai_conversation

Home Assistant custom component of conversation agent. It uses OpenAI to control your devices.
958 stars 134 forks source link

Image upload for GPT-4V? (Feature request) #43

Open mkammes opened 11 months ago

mkammes commented 11 months ago

GPT Plus members can use the upload function for media files, such as images.

https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images

As an example, this would be great to have an automation to take a picture of your bar (or refrigerator) and provide recipe suggestions based on the ingredients....along with a custom prompt.

Here is an example: https://m.facebook.com/groups/HomeAssistant/permalink/3611503665787644/?mibextid=Nif5oz .

I've tried doing this via Python and pyscript with little success.

Thanks!

jekalmin commented 11 months ago

Thanks for a suggestion.

I just read the post on facebook, and it is really interesting feature. There are several things I would have to check before I'm certain that it is possible.

Details

#### 1. The `content` type should be changed currently messages form like below. ``` [ {'role': 'system', 'content': "..."}, {'role': 'user', 'content': 'turn on bedroom light'}, {'role': 'function', 'name': 'execute_services', 'content': '[True]'} ] ``` In the openai [guide](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) you referenced, the content of message should be changed from `string` to `list`. ``` [ { 'role': 'user', 'content': [ {'type': 'text', 'text': 'What is in this image?'}, {'type': 'image_url', 'image_url': 'https://...'} ] } ] ``` #### 2. How to attach `image_url` Since it's hard for the component(`extended_openai_conversation`) to attach "image_url" in `user` role, the only way, I can think of, is to provide a function that attaches "image_url" via function response. I hope the format like below works, but there is no example that uses both function and image. ``` [ { 'role': 'function', 'content': [ {/* function response (don't know how this object will be formatted) */}, {'type': 'image_url', 'image_url': 'https://...'} ] } ] ```

I will look into this when I have time and GPT Plus is resumed.

mkammes commented 11 months ago

Outstanding! I can confirm it works (as I've replicated it via my own python script via the OpenAI example).

I'm happy to create an API key for you to test with.

Thanks!

jekalmin commented 11 months ago

Thank you! I will try by myself first, and then ask for help if needed!

jekalmin commented 10 months ago

I just upgraded and tried gpt-4-vision-preview model. Unfortunately, it seems that this model doesn't support functions. I got an error like below

Code

functions = [
    {
        "name": "get_image",
        "description": "Get image",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The image url",
                }
            },
            "required": ["url"],
        },
    }
]

response = openai.ChatCompletion.create(
  model="gpt-4-vision-preview",
  functions=functions,
  function_call="auto",
  messages=[
    {
      "role": "user",
      "content": "What’s in an image?"
    }
  ],
  max_tokens=300,
)

print(response)

Request

DEBUG:openai:api_version=None data='{"model": "gpt-4-vision-preview", "functions": [{"name": "get_image", "description": "Get image", "parameters": {"type": "object", "properties": {"url": {"type": "string", "description": "The image url"}}, "required": ["url"]}}], "function_call": "auto", "messages": [{"role": "user", "content": "What\\u2019s in an image?"}], "max_tokens": 300}' message='Post details'

Response

openai.error.InvalidRequestError: 2 validation errors for Request
body -> function_call
  extra fields not permitted (type=value_error.extra)
body -> functions
  extra fields not permitted (type=value_error.extra)

Maybe I will add a service, so that you can hook it via functions

jekalmin commented 10 months ago

I have added "query_image" service in https://github.com/jekalmin/extended_openai_conversation/pull/60.

You can try adding function like below

Function

- spec:
    name: get_refrigerator_items
    description: Get description of items in refrigerator
    parameters:
      type: object
      properties:
        url:
          type: string
          description: image url of refrigerator
          enum:
            - https://i.pinimg.com/originals/8b/cc/f1/8bccf14daf77ce887fc162934335cb21.jpg # needs to change
      required:
      - url
  function:
    type: composite
    sequence:
      - type: script
        sequence:
          - service: extended_openai_conversation.query_image
            data:
              prompt: What alcohol and brands do you see in this picture?
              images:
                - url: "{{url}}"
              max_tokens: 300
              config_entry: YOUR_CONFIG_ENTRY_KEY # needs to change
            response_variable: _function_result
        response_variable: image_result
      - type: template
        value_template: "{{image_result.choices[0].message.content}}"

Then ask "what's in refrigerator"

mkammes commented 10 months ago

Outstanding! Great work. I look forward to testing this out!

jekalmin commented 9 months ago

Released this in 1.0.1-beta2