patterns-ai-core / langchainrb

Build LLM-powered applications in Ruby
https://rubydoc.info/gems/langchainrb
MIT License
1.45k stars 195 forks source link

Add ability to send images to the Assistant when using Ollama #847

Closed andreibondarev closed 2 weeks ago

andreibondarev commented 1 month ago

Description

When using the Langchain::Assistant, with Ollama, we'd like to be able to send an image URLs to the LLM. Ollama docs: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion.

We should be able to do the following:

llm = Langchain::LLM::Ollama.new

assistant = Langchain::Assistant.new(llm: llm)

assistant.add_message_and_run(
  image_url: "https://gist.githubusercontent.com/andreibondarev/b6f444194d0ee7ab7302a4d83184e53e/raw/099e10af2d84638211e25866f71afa7308226365/sf-cable-car.jpg",
  content: "Please describe this image"
)
#=> LLM successfully responds.

Since Ollama only accepts base64-encoded images, we should download the image into memory and then convert to its Base64 format.

Sample code that I've prototyped:

require 'open-uri'
require 'base64'

def image_url_to_base64(url)
  # Download the image
  image_data = URI.open(url).read

  # Convert to Base64
  base64_encoded = Base64.strict_encode64(image_data)

  base64_encoded
end

url = "https://gist.githubusercontent.com/andreibondarev/b6f444194d0ee7ab7302a4d83184e53e/raw/099e10af2d84638211e25866f71afa7308226365/sf-cable-car.jpg"
base64_string = image_url_to_base64(url)

Tasks:

mattlindsey commented 3 weeks ago

Should we also support sending image data directly? This doesn't seem to work currently:

assistant.add_message_and_run(
  images: ["xxx-base64-data-zzz"],
  content: "Please describe this image"
)
3.2.1 :004 > assistant.add_message_and_run(
3.2.1 :005 >   images: ["xxx-base64-data-zzz"],
3.2.1 :006 >   content: "Please describe this image"
3.2.1 :007 > )
/Users/mattlindsey/github/langchainrb/lib/langchain/assistant.rb:159:in `add_message_and_run': unknown keyword: :images (ArgumentError)
    from (irb):4:in `<main>'
    from bin/console:51:in `<main>'
andreibondarev commented 2 weeks ago

@mattlindsey: Should we also support sending image data directly? This doesn't seem to work currently:

No, I don't think so, not yet.

mattlindsey commented 2 weeks ago

@mattlindsey: Should we also support sending image data directly? This doesn't seem to work currently:

No, I don't think so, not yet.

When Ollama supports sending just the URL to the llm (which OpenAI supports now), how will we indicate that we want to do that instead of fetching it into base64 in memory first? Maybe there's a nice way to indicate which method to do regardless of the llm.