Closed aaronepinto-bell closed 1 month ago
@aaronepinto-bell You can just pass a GCS uri with LangChain, it works well for me.
just construct it as:
pdf_message = {
"type": "image_url", # Assuming the LLM accepts PDF under this key, you might need to verify this
"image_url": {
"url": "gs://my_bucket/my_document.pdf"
},
}
@Adi8885 please, add it to our LC documentation
@lkuligin I've tried that using the recent langchain-core version == 0.1.52 but it doesn't work
from langchain_google_vertexai import VertexAI
from langchain_google_vertexai import HarmBlockThreshold, HarmCategory
from langchain_core.messages import HumanMessage
safety_settings = {
HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}
model = VertexAI(model_name="gemini-1.5-pro-preview-0514", project="project_id")
pdf_message = {
"type": "image_url",
"image_url": {
"url": "gs://....pdf"
},
}
text_message = {
"type": "text",
"text": "Summarize the provided document.",
}
message = HumanMessage(content=[text_message, pdf_message])
output = model.invoke([message])
Answer
Please provide me with the content of the PDF file located at the URL you provided: 'gs:...pdf'. I need the text content of the document to summarize it for you.
Once I have the content, I can analyze it and provide you with a concise and informative summary.
@lkuligin @Adi8885 Hey folks, confirming the same as above, it is unable to register the document
Any updates on this?
please, update the version of langchain-google-vertexai
@lkuligin the version is still 1.0.4 in pypi. I've seen the fix, it is the addition of the option "media" but that change is not available yet. I've seen the commit from @wafle in langchain-google-genai release . Please correct me if I'm wrong
new way to use
with open("file.pdf", "rb") as f:
pdf = base64.b64encode(f.read()).decode("utf-8")
content = [{
"type": "text",
"text": "prompt here",
},
{
"type": "media",
"mime_type": "application/pdf",
"data": pdf
}]
message = [HumanMessage(content=content)]
If it's updated in the python version, will it also work in the js version as well? I am using the JS version of langchain and wanted to make sure that pdf inputs would work here as well
@Kashi-Datum I'm sure you can use js. The solution for python comes from js
python and js integrations are different.
Confirmed PDF reading working with Gemini 1.5 pro using langchain JS v0.2.
const input = [
new HumanMessage({
content: [
{
type: "text",
text: "Describe the following PDF.",
},
{
type: "image_url",
image_url: `data:application/pdf;base64,${pdfFileInBase64}`,
},
],
}),
];
const pdfModel = new ChatVertexAI({ model: 'gemini-1.5-pro'});
const response = await pdfModel.invoke(input);
console.log(response);
Can anyone suggest a change in the example below to replace pdf_file_uri
with a local .pdf file?
with open("/Users/matheustorquato/Desktop/Q15-1016.pdf", "rb") as f:
pdf = base64.b64encode(f.read()).decode("utf-8")
content = [{
"type": "text",
"text": prompt,
},
{
"type": "media",
"mime_type": "application/pdf",
"data": pdf
}]
response = model.generate_content(content)
When trying the code above I get:
ValueError: Unknown field for Content: type
Reference code from Google's Process a PDF file with Gemini 1.5 Pro
import vertexai
from vertexai.generative_models import GenerativeModel, Part
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
vertexai.init(project=project_id, location="us-central1")
model = GenerativeModel(model_name="gemini-1.5-flash-001")
prompt = """
You are a very professional document summarization specialist.
Please summarize the given document.
"""
pdf_file_uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
pdf_file = Part.from_uri(pdf_file_uri, mime_type="application/pdf")
contents = [pdf_file, prompt]
response = model.generate_content(contents)
print(response.text)
Confirmed PDF reading working with Gemini 1.5 pro using langchain JS v0.2.
const input = [ new HumanMessage({ content: [ { type: "text", text: "Describe the following PDF.", }, { type: "image_url", image_url: `data:application/pdf;base64,${pdfFileInBase64}`, }, ], }), ]; const pdfModel = new ChatVertexAI({ model: 'gemini-1.5-pro'}); const response = await pdfModel.invoke(input); console.log(response);
Python version is not working with "ValueError: Image string must be one of: Google Cloud Storage URI, b64 encoded image string (data:image/...), valid image url, or existing local image file. Instead got 'data:application/pdf;base64,'."
Checked other resources
Example Code
Taken from the following discussion
Description I am trying to pass a PDF document to a gemini-1.5-pro in multimodal mode, following a process similar to the one explained here. The documentation illustrates how to pass an image and query Gemini Pro Vision, but I want to pass a PDF directly instead.
Here is my attempt:
This approach works, but I was hoping to make the LangChain method function similarly.
Package Information
Packages not installed (Not Necessarily a Problem) The following packages were not found: