zeno-ml / zeno-build

Build, evaluate, understand, and fix LLM-based apps
MIT License
484 stars 33 forks source link

Weight requests with `token_counts` as well #132

Closed Naman-ntc closed 10 months ago

Naman-ntc commented 1 year ago

https://github.com/zeno-ml/zeno-build/blob/c59fb35baed113de66441b9f0be476f475d92b39/zeno_build/models/providers/openai_utils.py#L107

Thanks for the asyncio support! Unfortunately, when the number of tokens in requests is high, I had to tune the number of requests parameter such that number_of_tokens_per_minute is less than rate limit (default 90000 tokens per minute) (or it leads to timeouts and failures).

I think it might make sense to add another aiolimiter for the token count. I am happy to look into this but was curious if there is some effort/thoughts about this already or I am doing something wrong!

neubig commented 1 year ago

Hi @Naman-ntc , thanks for the offer! We'd definitely welcome a PR to add this functionality, it would be useful and I don't think anyone is working on it right now.