patterns-ai-core / langchainrb

Build LLM-powered applications in Ruby
https://rubydoc.info/gems/langchainrb
MIT License
1.45k stars 195 forks source link

Analyzing Images #794

Closed ausangshukla closed 1 month ago

ausangshukla commented 2 months ago

Is your feature request related to a problem? Please describe. I have a bunch of images such as passports, licenses, tax docs etc. I need to extract and validate the data that they have by asking the LLM questions such as is the Passport expired? Is the tax doc of the year 2024. These questions will be adhoc and input by the users, so cant use off the shelf OCR for it.

Describe the solution you'd like

  1. Upload the image (ex Tax documents)
  2. Ask the question is it valid for 2024?
  3. What it the total tax paid?

Describe alternatives you've considered I know this can be done from the UI of chat gpt-4, but I dont have any other options at the moment

Additional context The questions are adhoc, but generally centered around validating and extracting facts from the image. And the documents are all images. It may already be doable with the assistants api, but an working example is required, as Im not able to make it work.

andreibondarev commented 1 month ago

@ausangshukla Yep, you'll be able to do that after this PR is merged.

andreibondarev commented 1 month ago

@ausangshukla Right now the Langchain::Assistant, when using OpenAI or MistralAI, supports sending image_url. Take a look at this example: https://gist.github.com/andreibondarev/b6f444194d0ee7ab7302a4d83184e53e. I'm imagining if you're uploading the same types of documents, you could define your own tool, like a PassportDataExtractor that would extract certain values, like { full_name:, expiration_date:, issue_date: }. What do you think?

andreibondarev commented 1 month ago

Closing this issue as it's duplicate with https://github.com/patterns-ai-core/langchainrb/issues/416.