Rate limiting - Githubissues

minimaxir / simpleaichat

Python package for easily interfacing with chat apps, with robust features and minimal code complexity.

MIT License

3.43k stars 224 forks source link

I asked about this on HN:

" Since this is async, can it automatically handle provided rate limits and batch queries appropriately? Seems like everyone has to roll their own on this and it’s much nicer to have smooth tooling for it."

You wrote:

" The underlying library for both sync and async is httpx (https://www.python-httpx.org/) which may be limited from the HTTP Client perspective but it may be possible to add rate limiting at a Session level. "

I would consider this a high-priority feature for anything that can handle multiple prompts asynchronously. Particularly because I have many tasks like: Retrieve embeddings for these 3M items

Perhaps consider integrating optionally with GPTCache which does natively handle rate limits.

I would consider this a high-priority feature

Given that OpenAI's ChatGPT rate limit is a generous 3500 RPM/58 RPS out of the box, incorporating rate limiting on simpleaichat is a solution for less than 1% of its potential users, which makes it not a high priority. (anyone hitting that limit in a production app would be better served by getting a rate increase)

Particularly because I have many tasks like: Retrieve embeddings for these 3M items

At that magnitude of data, that example is more a batching problem than an async problem. Batching may not be a bad idea for simpleaichat but not a priority. (and to be clear nothing in simpleaichat is working with embeddings)

Perhaps consider integrating optionally with GPTCache which does natively handle rate limits.

GPTCache is not zero-tech-debt and has many technical considerations, all of which are against the "simple" part of this package.

As I said, rate limits might be a neat idea but there are other areas of improvements which a higher impact.

minimaxir / simpleaichat

Rate limiting #25