Open alnutile opened 1 week ago
At a high-level, instead of using a trigger to embed documents on insert, I would use a trigger to put an entry in a work queue table. Then, have a background job (use pg_cron or timescaledb jobs) work the queue and implement rate limiting there.
interesting pg_cron
Question, one of the challenges so far with https://github.com/LlmLaraHub/larallama has been throttling the requests to APIs like Ollama, OpenAi, Claude etc.
For example with pgai if someone uploads 20-50 documents then the concern is that it will make 20+ concurrent requests to the backing API. For Ollama on my machine that would most likely cause a lot of timeouts or just a fail. As for OpenAi API it can results in issues of too many requests a minute.
Any thoughts on this factor as I consider replacing my existing code to move some of it into this?
Thanks!