thmsmlr / instructor_ex

Structured outputs for LLMs in Elixir
https://hexdocs.pm/instructor
430 stars 48 forks source link

Integration with local LLaVA, do we want it? What is the way? #40

Open thbar opened 3 months ago

thbar commented 3 months ago

I've been looking for local-only solutions to reliably extract structured data from invoices/receipts. No API/cloud solutions for obvious privacy reasons (those receipts can sometimes include credentials or account identifiers, and I don't want that data to leave the server in that case).

Thanks to a tweet, I came across this apparently very nice solution:

https://github.com/haotian-liu/LLaVA

A first test via their demo page with a real restaurant receipt worked very nicely (just as nicely as GPT4 currently), see https://twitter.com/thibaut_barrere/status/1773031570259001720 for the input and output.

I see (in #36) that other people are interested to extract data from images.

BumbleBee is also a possibility with a proper choice of model, of course.

Would this have its place in instructor_ex to your opinion?

If yes, is there a recommended path to integrate a new model? (unsure I'll tackle this, but at least interested to discuss that).

thbar commented 3 months ago

And I just saw that there is work going on at BumbleBee for that:

thbar commented 3 months ago

Llama CPP support here:

TwistingTwists commented 3 months ago

@thbar Here Is one approach.

  1. Use ollama to run llava
  2. Ollama is openai api compatible
  3. Use Insturctor.Aadapters.OpenAI with api_url: local host:1134 (ollama url) => voila.