timescale / pgai

Bring AI models closer to your PostgreSQL data
PostgreSQL License
318 stars 11 forks source link

Scaling and Throttling Questions #40

Open alnutile opened 1 week ago

alnutile commented 1 week ago

Question, one of the challenges so far with https://github.com/LlmLaraHub/larallama has been throttling the requests to APIs like Ollama, OpenAi, Claude etc.

For example with pgai if someone uploads 20-50 documents then the concern is that it will make 20+ concurrent requests to the backing API. For Ollama on my machine that would most likely cause a lot of timeouts or just a fail. As for OpenAi API it can results in issues of too many requests a minute.

Any thoughts on this factor as I consider replacing my existing code to move some of it into this?

Thanks!

jgpruitt commented 2 days ago

At a high-level, instead of using a trigger to embed documents on insert, I would use a trigger to put an entry in a work queue table. Then, have a background job (use pg_cron or timescaledb jobs) work the queue and implement rate limiting there.

alnutile commented 2 days ago

interesting pg_cron