Open mkammes opened 11 months ago
Thanks for a suggestion.
I just read the post on facebook, and it is really interesting feature. There are several things I would have to check before I'm certain that it is possible.
#### 1. The `content` type should be changed currently messages form like below. ``` [ {'role': 'system', 'content': "..."}, {'role': 'user', 'content': 'turn on bedroom light'}, {'role': 'function', 'name': 'execute_services', 'content': '[True]'} ] ``` In the openai [guide](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) you referenced, the content of message should be changed from `string` to `list`. ``` [ { 'role': 'user', 'content': [ {'type': 'text', 'text': 'What is in this image?'}, {'type': 'image_url', 'image_url': 'https://...'} ] } ] ``` #### 2. How to attach `image_url` Since it's hard for the component(`extended_openai_conversation`) to attach "image_url" in `user` role, the only way, I can think of, is to provide a function that attaches "image_url" via function response. I hope the format like below works, but there is no example that uses both function and image. ``` [ { 'role': 'function', 'content': [ {/* function response (don't know how this object will be formatted) */}, {'type': 'image_url', 'image_url': 'https://...'} ] } ] ```
I will look into this when I have time and GPT Plus is resumed.
Outstanding! I can confirm it works (as I've replicated it via my own python script via the OpenAI example).
I'm happy to create an API key for you to test with.
Thanks!
Thank you! I will try by myself first, and then ask for help if needed!
I just upgraded and tried gpt-4-vision-preview
model.
Unfortunately, it seems that this model doesn't support functions
.
I got an error like below
Code
functions = [
{
"name": "get_image",
"description": "Get image",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The image url",
}
},
"required": ["url"],
},
}
]
response = openai.ChatCompletion.create(
model="gpt-4-vision-preview",
functions=functions,
function_call="auto",
messages=[
{
"role": "user",
"content": "What’s in an image?"
}
],
max_tokens=300,
)
print(response)
Request
DEBUG:openai:api_version=None data='{"model": "gpt-4-vision-preview", "functions": [{"name": "get_image", "description": "Get image", "parameters": {"type": "object", "properties": {"url": {"type": "string", "description": "The image url"}}, "required": ["url"]}}], "function_call": "auto", "messages": [{"role": "user", "content": "What\\u2019s in an image?"}], "max_tokens": 300}' message='Post details'
Response
openai.error.InvalidRequestError: 2 validation errors for Request
body -> function_call
extra fields not permitted (type=value_error.extra)
body -> functions
extra fields not permitted (type=value_error.extra)
Maybe I will add a service, so that you can hook it via functions
I have added "query_image" service in https://github.com/jekalmin/extended_openai_conversation/pull/60.
You can try adding function like below
- spec:
name: get_refrigerator_items
description: Get description of items in refrigerator
parameters:
type: object
properties:
url:
type: string
description: image url of refrigerator
enum:
- https://i.pinimg.com/originals/8b/cc/f1/8bccf14daf77ce887fc162934335cb21.jpg # needs to change
required:
- url
function:
type: composite
sequence:
- type: script
sequence:
- service: extended_openai_conversation.query_image
data:
prompt: What alcohol and brands do you see in this picture?
images:
- url: "{{url}}"
max_tokens: 300
config_entry: YOUR_CONFIG_ENTRY_KEY # needs to change
response_variable: _function_result
response_variable: image_result
- type: template
value_template: "{{image_result.choices[0].message.content}}"
Then ask "what's in refrigerator"
Outstanding! Great work. I look forward to testing this out!
Released this in 1.0.1-beta2
GPT Plus members can use the upload function for media files, such as images.
https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images
As an example, this would be great to have an automation to take a picture of your bar (or refrigerator) and provide recipe suggestions based on the ingredients....along with a custom prompt.
Here is an example: https://m.facebook.com/groups/HomeAssistant/permalink/3611503665787644/?mibextid=Nif5oz .
I've tried doing this via Python and pyscript with little success.
Thanks!