jcheng5 / py-sidebot

26 stars 6 forks source link

Local model support #2

Open rpodcast opened 3 weeks ago

rpodcast commented 3 weeks ago

I may have misunderstood when this was first presented to me, but are there plans to allow developers to bring their own local (self-hosted) LLM instead of leveraging openAI? That will be important for what I have in mind to try this out.

iainwallacebms commented 3 weeks ago

Building on this, it would be great if it was easy to switch to other providers (specifically azure endpoints). thanks

iainwallacebms commented 3 weeks ago

Turns out it my use case is really straight forward using the litellm docs https://docs.litellm.ai/docs/providers/azure :)

iainwallacebms commented 3 weeks ago

And I think this is how it would work for local models - https://docs.litellm.ai/docs/providers/ollama#using-ollama-apichat

jcheng5 commented 3 weeks ago

@rpodcast in theory that link from Iain should do it. In practice, a couple of problems have emerged for me in the past when trying different providers.

  1. My code tries to use streaming and tool calling simultaneously. This is in theory always possible, but apparently people don't do it that commonly with litellm, because I've hit bugs in the litellm implementation when doing so. I had this with Anthropic and had to file PRs with litellm, fortunately the litellm maintainer is extremely responsive and releases seemingly daily. It's very possible you'll hit such problems when using litellm's other providers--please let me know if so, and I can try to help get them fixed.
  2. In my testing, llama 3.1 did not do as good of a job as GPT-4o or Claude-3.5-Sonnet, and the mistakes that it made were big enough to remove support. That said, it's possible more prompt work or fine tuning could make the difference, or using one of the fine tuned variants other people have made (or will make soon).
cboettig commented 2 weeks ago

@jcheng5 very cool stuff here! I've had a similar experience with llama3.1:8b and similar smaller models doing less well even just on sql query generation, though some of the smaller SQL-focused models like https://ollama.com/library/duckdb-nsql seem to do pretty well.

Maybe a separate issue, but in addition to local model support I'd be curious to hear if you've tried any of the other providers of interfaces? litellm is new to me but looks very nice. I've been playing around with langchain and more recently with pandas-ai, both of which I think have more specific tooling for handling text->SQL->code execution pattern you are also using here? Would be super curious to compare notes!