Closed ausangshukla closed 1 month ago
@ausangshukla Yep, you'll be able to do that after this PR is merged.
@ausangshukla Right now the Langchain::Assistant, when using OpenAI or MistralAI, supports sending image_url
. Take a look at this example: https://gist.github.com/andreibondarev/b6f444194d0ee7ab7302a4d83184e53e. I'm imagining if you're uploading the same types of documents, you could define your own tool, like a PassportDataExtractor that would extract certain values, like { full_name:, expiration_date:, issue_date: }
. What do you think?
Closing this issue as it's duplicate with https://github.com/patterns-ai-core/langchainrb/issues/416.
Is your feature request related to a problem? Please describe. I have a bunch of images such as passports, licenses, tax docs etc. I need to extract and validate the data that they have by asking the LLM questions such as is the Passport expired? Is the tax doc of the year 2024. These questions will be adhoc and input by the users, so cant use off the shelf OCR for it.
Describe the solution you'd like
Describe alternatives you've considered I know this can be done from the UI of chat gpt-4, but I dont have any other options at the moment
Additional context The questions are adhoc, but generally centered around validating and extracting facts from the image. And the documents are all images. It may already be doable with the assistants api, but an working example is required, as Im not able to make it work.