Thanks for the asyncio support! Unfortunately, when the number of tokens in requests is high, I had to tune the number of requests parameter such that number_of_tokens_per_minute is less than rate limit (default 90000 tokens per minute) (or it leads to timeouts and failures).
I think it might make sense to add another aiolimiter for the token count. I am happy to look into this but was curious if there is some effort/thoughts about this already or I am doing something wrong!
Hi @Naman-ntc , thanks for the offer! We'd definitely welcome a PR to add this functionality, it would be useful and I don't think anyone is working on it right now.
https://github.com/zeno-ml/zeno-build/blob/c59fb35baed113de66441b9f0be476f475d92b39/zeno_build/models/providers/openai_utils.py#L107
Thanks for the
asyncio
support! Unfortunately, when the number of tokens in requests is high, I had to tune the number of requests parameter such thatnumber_of_tokens_per_minute
is less than rate limit (default 90000 tokens per minute) (or it leads to timeouts and failures).I think it might make sense to add another
aiolimiter
for the token count. I am happy to look into this but was curious if there is some effort/thoughts about this already or I am doing something wrong!