Open paulpalmieri opened 1 month ago
Hey there, @paulpalmieri! I'm here to help you with any questions or issues you have while waiting for a human maintainer. Let's tackle this together!
To use TokenCountingHandler
to listen for calls from each model and count tokens with the proper tokenizer each time, you should use a single CallbackManager
that manages multiple TokenCountingHandler
instances, each configured with the appropriate tokenizer for each model. Here is an example of how you can achieve this:
import tiktoken
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.core import Settings
# Create token counters for different models
token_counter_gpt4o = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-4o").encode,
verbose=False,
)
token_counter_gemini = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gemini").encode,
verbose=False,
)
# Create a single CallbackManager with both token counters
callback_manager = CallbackManager([token_counter_gpt4o, token_counter_gemini])
Settings.callback_manager = callback_manager
document = SimpleDirectoryReader("./data").load_data()
# Use the appropriate token counter for each model
index_gpt4o = VectorStoreIndex.from_documents(
documents,
callback_manager=CallbackManager([token_counter_gpt4o])
)
index_gemini = VectorStoreIndex.from_documents(
documents,
callback_manager=CallbackManager([token_counter_gemini])
)
# Access the counts directly
print("GPT-4o Embedding Tokens: ", token_counter_gpt4o.total_embedding_token_count)
print("Gemini Embedding Tokens: ", token_counter_gemini.total_embedding_token_count)
# Reset the counts at your discretion
token_counter_gpt4o.reset_counts()
token_counter_gemini.reset_counts()
# Track prompt, completion, and total LLM tokens for GPT-4o
response_gpt4o = index_gpt4o.as_query_engine().query("What did the author do growing up?")
print(
"GPT-4o Embedding Tokens: ",
token_counter_gpt4o.total_embedding_token_count,
"\n",
"GPT-4o LLM Prompt Tokens: ",
token_counter_gpt4o.prompt_llm_token_count,
"\n",
"GPT-4o LLM Completion Tokens: ",
token_counter_gpt4o.completion_llm_token_count,
"\n",
"GPT-4o Total LLM Token Count: ",
token_counter_gpt4o.total_llm_token_count,
)
# Track prompt, completion, and total LLM tokens for Gemini
response_gemini = index_gemini.as_query_engine().query("What did the author do growing up?")
print(
"Gemini Embedding Tokens: ",
token_counter_gemini.total_embedding_token_count,
"\n",
"Gemini LLM Prompt Tokens: ",
token_counter_gemini.prompt_llm_token_count,
"\n",
"Gemini LLM Completion Tokens: ",
token_counter_gemini.completion_llm_token_count,
"\n",
"Gemini Total LLM Token Count: ",
token_counter_gemini.total_llm_token_count,
)
This setup ensures that each TokenCountingHandler
is properly recording the calls for its respective model [1][2][3][4].
To continue talking to Dosu, mention @dosu.
You'd have to attach each handler to the LLM
OpenAI(..., callback_manager=CallbackManager([mm_llm_token_counter])
Gemini(..., callback_manager=CallbackManager([llm_token_counter])
Question Validation
Question
Hello!
Let's say I use a multimodal modal like
gpt-4o
and a text model likegemini
.How can I best use
TokenCountingHandler
to listen for calls from each models and count tokens with the proper tokenizer each time ?It seems that declaring two handlers like this:
Results in only one of the handler recording the calls.
Thanks a lot!