langchain-ai / langchainjs

πŸ¦œπŸ”— Build context-aware reasoning applications πŸ¦œπŸ”—
https://js.langchain.com/docs/
MIT License
11.79k stars 1.97k forks source link

GoogleGenerativeAIError: Add an image to use models/gemini-pro-vision, or switch your model to a text model. #5744

Open asimplify opened 3 weeks ago

asimplify commented 3 weeks ago

Checked other resources

Example Code

    let messages = [
      SystemMessagePromptTemplate.fromTemplate(
        `${this.agent.prompt} 
        If you do not get the answer then politely apologize to user without giving any reason. 
        Must try to answer user query using your tools as much as possible.
        ` ?? "You are an AI assistant to answer user queries."
      ),
      new MessagesPlaceholder("chat_history"),
      ModelConfig[this.agent.modelName].isMultiModel && imageData
        ? HumanMessagePromptTemplate.fromTemplate([
            {
              text: "Question: {query}",
            },
            {
              image_url: {
                url: `data:image/jpeg;base64,${imageData}`,
              },
            },
          ])
        : HumanMessagePromptTemplate.fromTemplate("{query}"),
    ];
    const formattedPrompt = await prompt.formatMessages({ query: userQuery, context, chat_history: chatHistory });
    const llm = new ChatGoogleGenerativeAI({
              modelName,
              temperature,
            });
          }
    const streamResponse = await llm.stream(formattedPrompt);

    for await (const chunk of streamResponse) {
      answer += chunk.content;
      stream?.(chunk.content);
    }

Error Message and Stack Trace (if applicable)

GoogleGenerativeAIError: [400 Bad Request] Add an image to use models/gemini-pro-vision, or switch your model to a text model. { status: 400, attemptNumber: 1, retriesLeft: 6 }

This is the error producing when using google-pro-vision model.

Description

I am using ChatGoogleGenerativeAI library for multimodel generation, that's working fine with gemini-1.5-pro and gemini-1.5 flash but when using gemini-pro-vision it produces an error even I provide image with required query.

I am following the steps as mentioned in documentation: https://js.langchain.com/v0.2/docs/integrations/chat/google_generativeai

System Info

Platform: Windows 10 Node version: 18.13.0 NPM version: 9.4.1

dosubot[bot] commented 3 weeks ago

Hey there, @asimplify! I'm a bot here to help you with any bugs, questions, or contributions. Let's tackle this issue together while we wait for a human maintainer.

To resolve the GoogleGenerativeAIError: [400 Bad Request] when using the gemini-pro-vision model for multimodal generation, ensure that you are correctly setting the modelName field to gemini-pro-vision and that you are providing the required parameters correctly. Here is an example of how to instantiate and use the ChatGoogleGenerativeAI class with the gemini-pro-vision model:

import fs from "fs";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { HumanMessage } from "@langchain/core/messages";

// Multi-modal
const vision = new ChatGoogleGenerativeAI({
  modelName: "gemini-pro-vision",
  maxOutputTokens: 2048,
});
const image = fs.readFileSync("./hotdog.jpg").toString("base64");
const input2 = [
  new HumanMessage({
    content: [
      {
        type: "text",
        text: "Describe the following image.",
      },
      {
        type: "image_url",
        image_url: `data:image/png;base64,${image}`,
      },
    ],
  }),
];

const res2 = await vision.invoke(input2);

console.log(res2);

/*
  AIMessage {
    content: ' The image shows a hot dog in a bun. The hot dog is grilled and has a dark brown color. The bun is toasted and has a light brown color. The hot dog is in the center of the bun.',
    name: 'model',
    additional_kwargs: {}
  }
*/

// Multi-modal streaming
const res3 = await vision.stream(input2);

for await (const chunk of res3) {
  console.log(chunk);
}

/*
  AIMessageChunk {
    content: ' The image shows a hot dog in a bun. The hot dog is grilled and has grill marks on it. The bun is toasted and has a light golden',
    name: 'model',
    additional_kwargs: {}
  }
  AIMessageChunk {
    content: ' brown color. The hot dog is in the center of the bun.',
    name: 'model',
    additional_kwargs: {}
  }
*/

Ensure that the image is correctly encoded in base64 format and included in the query. If the error persists, double-check the API key and other parameters to ensure they are correctly set. Additionally, make sure you are using the correct field name modelName instead of model when initializing the ChatGoogleGenerativeAI class [1][2][3].

To continue talking to Dosu, mention @dosu.

afirstenberg commented 3 weeks ago

Correct, this is a result of a set of limitations of the gemini-pro-vision model. It does not support chat mode, so only the first human message is processed, and subsequent attempts will generate the error shown.

While gemini-pro-vision is still a valid model on the AI Studio / Google Generative AI platform, it isn't listed as being in production, while the Gemini 1.5 models (which don't have the same limitation) are in production.

afirstenberg commented 3 weeks ago

Since I commented yesterday, Google has now officially deprecated Gemini 1.0 Pro Vision (see https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.0-pro-vision) in favor of Gemini 1.5. It is scheduled to be removed from the AI Studio API on July 12th 2024.

If you feel you still need Gemini 1.0 Pro Vision, you can switch to Vertex AI, although it is only scheduled to be supported until Feb 2025.

@jacoblee93 - Suggest closing this issue as not being a LangChain.js bug.