Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library

aaronepinto-bell commented 1 month ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Taken from the following discussion

Description I am trying to pass a PDF document to a gemini-1.5-pro in multimodal mode, following a process similar to the one explained here. The documentation illustrates how to pass an image and query Gemini Pro Vision, but I want to pass a PDF directly instead.

Here is my attempt:

from langchain_core.messages import HumanMessage
from langchain_google_vertexai import ChatVertexAI
import base64

file_location = "/path/to/my/document.pdf"

# Initialize the LangChain LLM
llm = ChatVertexAI(model_name="gemini-1.5-pro-preview-0409")

# Open and read the PDF file
with open(file_location, "rb") as pdf_file:
    pdf_bytes = pdf_file.read()

# Create a message containing the Base64-encoded PDF
pdf_message = {
    "type": "image_url",  # Assuming the LLM accepts PDF under this key, you might need to verify this
    "image_url": {
        "url": f"data:application/pdf;base64,{base64.b64encode(pdf_bytes).decode('utf-8')}"
    },
}

# Create a text message asking what the PDF contains
text_message = {
    "type": "text",
    "text": "What does this PDF contain?",
}

# Combine the messages into a HumanMessage object
message = HumanMessage(content=[text_message, pdf_message])

# Send the message to the LLM and print the response
output = llm.invoke([message])
print(output.content)

Unfortunately this code fails.
However, if I use the official Vertex AI library, I am able to do it. Here is part of my code:

from vertexai.generative_models import GenerativeModel, Part

model = GenerativeModel("gemini-1.5-pro-preview-0409")

prompt = "please summarise the provided pdf document"

pdf_file_uri = "gs://my_bucket/my_document.pdf"
pdf_file = Part.from_uri(pdf_file_uri, mime_type="application/pdf")
contents = [pdf_file, prompt]

response = model.generate_content(contents)
print(response.text)

This approach works, but I was hoping to make the LangChain method function similarly.

System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:11:05 PDT 2024; root:xnu-10063.101.17~1/RELEASE_X86_64
Python Version: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]

Package Information

langchain_core: 0.1.40
langchain: 0.1.14
langchain_community: 0.0.31
langsmith: 0.1.40
langchain_google_genai: 0.0.5
langchain_google_vertexai: 0.1.2
langchain_openai: 0.1.1
langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem) The following packages were not found:


langgraph
langserve
111

### Error Message and Stack Trace (if applicable)

_No response_

### Description

I am trying to pass a PDF document to a gemini-1.5-pro in multimodal mode, following a process similar to the one explained [here](https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm/#multimodality). The documentation illustrates how to pass an image and query Gemini Pro Vision, but I want to pass a PDF directly instead.

This native Vertex AI library feature is not implemented within LangChain. I am happy to contribute and implement; would appreciate some pointers on where to start as a new contributer.

### System Info

System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:11:05 PDT 2024; root:xnu-10063.101.17~1/RELEASE_X86_64
Python Version: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]

lkuligin commented 1 month ago

@aaronepinto-bell You can just pass a GCS uri with LangChain, it works well for me.

just construct it as:

pdf_message = {
    "type": "image_url",  # Assuming the LLM accepts PDF under this key, you might need to verify this
    "image_url": {
        "url": "gs://my_bucket/my_document.pdf"
    },
}

lkuligin commented 1 month ago

@Adi8885 please, add it to our LC documentation

jamesev15 commented 1 month ago

@lkuligin I've tried that using the recent langchain-core version == 0.1.52 but it doesn't work

from langchain_google_vertexai import VertexAI
from langchain_google_vertexai import HarmBlockThreshold, HarmCategory
from langchain_core.messages import HumanMessage

safety_settings = {
    HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}

model = VertexAI(model_name="gemini-1.5-pro-preview-0514", project="project_id")

pdf_message = {
    "type": "image_url",
    "image_url": {
        "url": "gs://....pdf"
    },
}
text_message = {
    "type": "text",
    "text": "Summarize the provided document.",
}
message = HumanMessage(content=[text_message, pdf_message])

output = model.invoke([message])

Answer

Please provide me with the content of the PDF file located at the URL you provided: 'gs:...pdf'. I need the text content of the document to summarize it for you. 

Once I have the content, I can analyze it and provide you with a concise and informative summary.

aaronepinto-bell commented 1 month ago

@lkuligin @Adi8885 Hey folks, confirming the same as above, it is unable to register the document

Kashi-Datum commented 1 month ago

Any updates on this?

lkuligin commented 1 month ago

please, update the version of langchain-google-vertexai

jamesev15 commented 1 month ago

@lkuligin the version is still 1.0.4 in pypi. I've seen the fix, it is the addition of the option "media" but that change is not available yet. I've seen the commit from @wafle in langchain-google-genai release . Please correct me if I'm wrong

new way to use

with open("file.pdf", "rb") as f:
    pdf = base64.b64encode(f.read()).decode("utf-8")

content = [{
    "type": "text",
    "text": "prompt here",
},
{
    "type": "media",
    "mime_type": "application/pdf",
    "data": pdf
}]
message = [HumanMessage(content=content)]

Kashi-Datum commented 1 month ago

If it's updated in the python version, will it also work in the js version as well? I am using the JS version of langchain and wanted to make sure that pdf inputs would work here as well

jamesev15 commented 1 month ago

@Kashi-Datum I'm sure you can use js. The solution for python comes from js

lkuligin commented 1 month ago

python and js integrations are different.

Kashi-Datum commented 1 month ago

Confirmed PDF reading working with Gemini 1.5 pro using langchain JS v0.2.


const input = [
  new HumanMessage({
    content: [
      {
        type: "text",
        text: "Describe the following PDF.",
      },
      {
        type: "image_url",
        image_url: `data:application/pdf;base64,${pdfFileInBase64}`,
      },
    ],
  }),
];

const pdfModel = new ChatVertexAI({ model: 'gemini-1.5-pro'});
const response = await pdfModel.invoke(input);
console.log(response);

matheusft commented 1 month ago

Can anyone suggest a change in the example below to replace pdf_file_uri with a local .pdf file?

with open("/Users/matheustorquato/Desktop/Q15-1016.pdf", "rb") as f:
    pdf = base64.b64encode(f.read()).decode("utf-8")

content = [{
    "type": "text",
    "text": prompt,
},
{
    "type": "media",
    "mime_type": "application/pdf",
    "data": pdf
}]

response = model.generate_content(content)

When trying the code above I get:

ValueError: Unknown field for Content: type

Reference code from Google's Process a PDF file with Gemini 1.5 Pro

import vertexai

from vertexai.generative_models import GenerativeModel, Part

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"

vertexai.init(project=project_id, location="us-central1")

model = GenerativeModel(model_name="gemini-1.5-flash-001")

prompt = """
You are a very professional document summarization specialist.
Please summarize the given document.
"""

pdf_file_uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
pdf_file = Part.from_uri(pdf_file_uri, mime_type="application/pdf")
contents = [pdf_file, prompt]

response = model.generate_content(contents)
print(response.text)

Maxch3306 commented 2 weeks ago

Confirmed PDF reading working with Gemini 1.5 pro using langchain JS v0.2.


const input = [
  new HumanMessage({
    content: [
      {
        type: "text",
        text: "Describe the following PDF.",
      },
      {
        type: "image_url",
        image_url: `data:application/pdf;base64,${pdfFileInBase64}`,
      },
    ],
  }),
];

const pdfModel = new ChatVertexAI({ model: 'gemini-1.5-pro'});
const response = await pdfModel.invoke(input);
console.log(response);

Python version is not working with "ValueError: Image string must be one of: Google Cloud Storage URI, b64 encoded image string (data:image/...), valid image url, or existing local image file. Instead got 'data:application/pdf;base64,'."

langchain-ai / langchain-google

Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library #215

Checked other resources

Example Code