Open andreibondarev opened 11 months ago
Hi @andreibondarev, I noticed that the current version already supports sending images to LLMs.
You just need to include the image within the messages
parameter. For example, when using OpenAI models, you can include images using the image_url
content type. Here's how:
llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
llm.chat(
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: {
url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
model: "gpt-4o"
).completion
Other LLMs only support sending the image in base64 format, but this must still be done within the messages
parameter.
Support for OpenAI with https://github.com/patterns-ai-core/langchainrb/pull/799.
You have probably thought about this already, but seems there are many cases to support. One solution is have all these different parameters to the Assistant:
image_url_data
where image is fetched from URL into base64 first and sent to llm that way (or image_urls_data
array)
image_url
where just URL is sent to llm (or image_urls
array)
image
where single base64 data is sent (or images
array)
image_filename
where file is read into memory and sent (or image_filenames
array)
Sorry if I'm making this too confusing/complicated. There's also image_uri
I suppose.
You should be able to provide an
image_url
to the Assistant for the supported multi-modal LLMs:Note
Some of the LLMs do not accept an image_url rather a Base64-encoded payload (Anthropic) or a file URI uploaded to the cloud (Google Gemini). We need to figure out how to handle it.