googleapis / nodejs-vertexai

Apache License 2.0
94 stars 30 forks source link

Function Call doesn't seem to work with images as part of the prompts #349

Open thegrandpoobah opened 1 month ago

thegrandpoobah commented 1 month ago

When configuring Gemini/Vertex AI with a function call and a prompt that includes an image, VertexAI/Gemini throws a 500 error with no description of what the issue actually is.

Environment details

Steps to reproduce

  1. Create a function call whilst passing in an image as part of the prompt
  2. Gemini/Vertex AI throws 500 error

In the documentation, all the examples for doing function calls are with text prompts, so I may just be doing something that is not supported, but I also couldn't find anything in the docs that said images are NOT supported as part of prompts for function calls.

Additionally, I have tested my prompt without the image as part of the context, and it does the function call as you would expect.

Please NOTE: I can reproduce this with both the vertex ai library and a straight up curl call as well, so this is most likely a Gemini issue rather than a library issue, but since I don't have a google support contract, I can't really open a support ticket, so for the sake of trying to bring visibility to this issue, I'm filing it here. Apologies if there is a better venue available.

POST API endpoint:

https://us-central1-aiplatform.googleapis.com/v1/projects/<projectcode>/locations/us-central1/publishers/google/models/gemini-1.5-pro-preview-0514:generateContent

What is being passed to Vertex AI:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "base64ofimage"
          }
        },
        {
          "text": "categorize the image"
        }
      ]
    }
  ],
"tools": [
    {
      "functionDeclarations": [
        {
          "name": "categorize",
          "description": "accepts the categorized guess from model and stores it in API cache",
          "parameters": {
            "type": "object",
            "properties": {
              "category": { "type": "string", "description": "the category of the image" }
            }
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "temperature": 1,
    "topP": 0.95
  },
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }
  ]
}

response:

{
  "error": {
    "code": 500,
    "message": "Internal error encountered.",
    "status": "INTERNAL"
  }
}

a working JSON with only a text prompt:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "categorize an old chandelier"
        }
      ]
    }
  ],
  "tools": [
    {
      "functionDeclarations": [
        {
          "name": "categorize",
          "description": "accepts the categorized guess from model and stores it in API cache",
          "parameters": {
            "type": "object",
            "properties": {
              "category": { "type": "string", "description": "the category of the object, example boats" }
            }
          }
        }
      ]
    }
  ],  
  "generationConfig": {
    "temperature": 1,
    "topP": 0.95
  },
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }
  ]
}
thegrandpoobah commented 1 month ago

For reference, here is a Reddit post from about a month ago about the same problem: https://www.reddit.com/r/Bard/comments/1cg1nci/error_500_api/

mwohlan commented 1 month ago

I just had the same issue. Very annoying and limits the use cases for me. This is what I found in the docs:

image

Vertex AI Doc Reference

Is there any kind of timetable to enable function calling with multimodal prompts ?

As a workaround I found the json mode functionality (only for 1.5 pro though):

JSON Mode

It did work in my short tests with 1.5 pro and 1.5 flash, but I dont know if this will be reliable.

Issues with JSON Mode: According to the docs its only supported by 1.5 pro and the response_mime_type parameter does give me a ts error but I got consistent JSON output so far (about 20 API Calls)