Using Prompty with gpt-4o model

sjuratov commented 3 months ago

I am trying to use Prompty with gpt-4o model. Here is Prompty file

name: DocumentAnalysis
description: A prompt that uses context to ground an incoming question
authors:
  - SJ
model:
  api: chat
  configuration:
    type: azure_openai
    azure_deployment: ${env:AZURE_OPENAI_DEPLOYMENT}
    api_key: ${env:AZURE_OPENAI_API_KEY}
    api_version: ${env:AZURE_OPENAI_API_VERSION}
    azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
  parameters:
    max_tokens: 800
    temperature: 0.2
sample:
  image: funny_image_1.jpg
  question: Analyze image
---

system:
You are an AI assistant who helps people find information.
You are an expert in document analysis.

user:
<img src="{{image}}">
{{question}}

When I test from VSCode, I get following msg. I'm unable to view images directly. However, if you describe the image to me, I can help analyze it or provide information based on your description.

I assume Prompty is supporting this scenario, it's probably something to do with Prompty file.

Any idea?

wayliums commented 3 months ago

@sethjuarez we need more samples. @sjuratov for prompty, it's mostly based on markdown syntax. So for images it would be

![image]({{image}})

I haven't tried with GPT4o, but i tried with GPTV and it's working.

sjuratov commented 3 months ago

Thanks @wayliums. Unfortunately it still doesn't work for me. I've tried with gpt-4o, gpt-4-vision-preview and gpt-4-turbo-2024-04-09.

I've changed my Prompty per your suggestion. The rest of the Prompty is per above.

user:
![image]({{image}})
{{question}}

Can you maybe share your Prompty file?

wayliums commented 3 months ago

@sjuratov here's an example, the images are in the same folder of the prompty file. In the prompty preview mode, did you see the images showing up?

---
name: Contoso Sales Writer
description: A prompt that uses context to ground an incoming question
authors:
  - Seth Juarez
model:
  api: chat
sample:
  question: What should I do with this?
  image: phone.jpg
---
system:
You are a Contoso support inspector who looks at images to figure out what products are in the image and 
what might be wrong with them. Write a list of items that are in the image and what might be wrong with them.

# Example Output
{
    "item": "TV Set",
    "issues": [
        "This tv is old and may not work properly.",
        "The screen is cracked.",
        "The plug may be broken."
    ]
}

Only return json as formatted above and make sure the item descriptions use phrases that can easily be used 
for a search query. Find the most prominent electronic device in the image and list the issues with it.

# User Supplied image
Use the following image for your assessment:

![image]({{image}})

# Instructions
Return only the most prominent items in the image. Do not return items that are not prominent.
Be as verbose as possible with the issues. The more information you provide, the better the support 
team can help the customer. Formulate each issue in the form of a really good search query.

user:
{{question}}
![image](phone_embed.jpg)

sjuratov commented 3 months ago

Thanks @wayliums, this was very helpful. Your example did not work "out of the box" but gave me what I needed to move forward.

When I ran your example, it gave me an error that configuration and parameters were missing. So, I changed it as follow:

model:
  api: chat
  configuration:
    type: azure_openai
    azure_deployment: ${env:AZURE_OPENAI_DEPLOYMENT}
    api_key: ${env:AZURE_OPENAI_API_KEY}
    api_version: ${env:AZURE_OPENAI_API_VERSION}
    azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
  parameters:
    max_tokens: 800
    temperature: 0.2

After this change, it worked like a charm.

I also then rewrote prompt in my own example, to be more aligned with yours. After that my example worked as well.

So my problem in the end wasn't really in passing image to model endpoint, but rather prompt itself.

microsoft / prompty

Using Prompty with gpt-4o model #16