Weight requests with `token_counts` as well

https://github.com/zeno-ml/zeno-build/blob/c59fb35baed113de66441b9f0be476f475d92b39/zeno_build/models/providers/openai_utils.py#L107

Thanks for the asyncio support! Unfortunately, when the number of tokens in requests is high, I had to tune the number of requests parameter such that number_of_tokens_per_minute is less than rate limit (default 90000 tokens per minute) (or it leads to timeouts and failures).

I think it might make sense to add another aiolimiter for the token count. I am happy to look into this but was curious if there is some effort/thoughts about this already or I am doing something wrong!

zeno-ml / zeno-build

Weight requests with `token_counts` as well #132