mhawksey / GeminiApp

GeminiApp is a library that allows integration to Google's Gemini API in your Google Apps Script projects. It allows for mutli-modal prompts, structured conversations and function calling
Apache License 2.0
67 stars 4 forks source link

change from region / projectID to API KEY got exception running in appscript #4

Open fwermus opened 7 months ago

fwermus commented 7 months ago

I am running in appscriot. If I run it using location and project id runs fine. But, If I change to api key I got

Request to Gemini failed with response code 400 - { "error": { "code": 400, "message": "Add an image to use models/gemini-pro-vision, or switch your model to a text model.", "status": "INVALID_ARGUMENT" } }

I am using

inlineData: {
  data: base64EncodedImage,
  mimeType: file.getMimeType()
},
mhawksey commented 7 months ago

If you run runTextAndImages() using the script example below updating YOUR_PROJECT_ID do you get the same error?

const genAI = new GeminiApp({
    region: 'us-central1',
    project_id: 'YOUR_PROJECT_ID',
  });

function fileToGenerativePart(id) {
  const file = DriveApp.getFileById(id);
  const imageBlob = file.getBlob();
  const base64EncodedImage = Utilities.base64Encode(imageBlob.getBytes())

  return {
    inlineData: {
      data: base64EncodedImage,
      mimeType: file.getMimeType()
    },
  };
}

async function runTextAndImages() {
  // For text-and-images input (multimodal), use the gemini-pro-vision model
  const model = genAI.getGenerativeModel({ model: "gemini-pro-vision" });

  const prompt = "What's different between these pictures?";

  const imageParts = [
    fileToGenerativePart("1LXeJgNhlpnpS0RBfil6Ybx7QRvfqwvEh"),
    fileToGenerativePart("1OFV88Zf5esi-Mtuap4iQyoCVeYlvIeqU"),
  ];

  const result = await model.generateContent([prompt, ...imageParts]);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}
fwermus commented 7 months ago

It fails with a pdf file and api key, but It runs with pdf file and project id and region and It does work if using vertex ai IDE.

const genAI = new GeminiApp("API_KEY");

It is a requiment to use API KEY instead of PROJECT ID and REGION

function fileToGenerativePart(id) {
  const file = DriveApp.getFileById(id);
  const imageBlob = file.getBlob();
  const base64EncodedImage = Utilities.base64Encode(imageBlob.getBytes())

  return {
    inlineData: {
      data: base64EncodedImage,
      mimeType: file.getMimeType()
    },
  };
}

async function runTextAndImages() {
  // For text-and-images input (multimodal), use the gemini-pro-vision model
  const model = genAI.getGenerativeModel({ model: "gemini-pro-vision" });

  const prompt = "What does the pdf file says?";

  const imageParts = [
    fileToGenerativePart("1WtZquqcIFinWl_xQ9ZQFe3Cn7S1ihDj-"),
  ];

  const result = await model.generateContent([prompt, ...imageParts]);
  const response = await result.response;
  const text = response.text();
  console.log(text);
}

throws with api key and pdf file

Request to Gemini failed with response code 400 - {
  "error": {
    "code": 400,
    "message": "Add an image to use models/gemini-pro-vision, or switch your model to a text model.",
    "status": "INVALID_ARGUMENT"
  }
}

To my understanding, It accepts media files

mhawksey commented 7 months ago

It gets confusing with the Google AI Studio and Vertex AI capabilities. The different model versions add some complexity. The ability to inline pdf's is a preview feature in Gemini 1.5 Pro:

const model = genAI.getGenerativeModel({ model: "gemini-pro-vision" });

to:

const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro-preview-0409" });

You can read more about an Overview of multimodal models