Open lyie28 opened 3 months ago
Agreed, here are some other improvements that could lead to better token cost estimations
TokenCountingHandler
requires tokenizers to be passed. Also, TokenCountingHandler
requires us to pass a callable tokenizer, which means we need to find tokenizers that llamaindex already supports as part of their callbacks. Reliance on their callbacks also forces us to instantiate the TokenCounter
as the very first object which kind of pollutes our lavague-tests
configs.
We may have an easier time counting tokens if we didn't rely on llamaindex token counting 😊
I think as we add more models to our built-in pricing info, we will have some issues between how we know the pricing of a model name that is provided by two different providers with different pricing as sometimes the model name/tag is exactly the same. For example, we may need to know if we should use pricing from OpenAI or Azure for the model name 'gpt4-o'
It might also be better if instead of hardcoding our own list of pricing, we could integrate more directly the Lite-llm dict: https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json so we can fetch pricing from here -> this could also help resolve the ambiguity issue
@paulpalmieri @adeprez