run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.38k stars 4.98k forks source link

[Question]: Can we use `TokenCountingHandler` with different models. #15020

Open paulpalmieri opened 1 month ago

paulpalmieri commented 1 month ago

Question Validation

Question

Hello!

Let's say I use a multimodal modal like gpt-4o and a text model like gemini.

How can I best use TokenCountingHandler to listen for calls from each models and count tokens with the proper tokenizer each time ?

It seems that declaring two handlers like this:

mm_llm_token_counter = TokenCountingHandler(
    tokenizer=gpt_tokenizer,
    event_starts_to_ignore=[CBEventType.EMBEDDING],
    event_ends_to_ignore=[CBEventType.EMBEDDING],
)
llm_token_counter = TokenCountingHandler(
    tokenizer=gemini_tokenizer,
    event_starts_to_ignore=[CBEventType.EMBEDDING],
    event_ends_to_ignore=[CBEventType.EMBEDDING],
)

Results in only one of the handler recording the calls.

Thanks a lot!

dosubot[bot] commented 1 month ago

Hey there, @paulpalmieri! I'm here to help you with any questions or issues you have while waiting for a human maintainer. Let's tackle this together!

To use TokenCountingHandler to listen for calls from each model and count tokens with the proper tokenizer each time, you should use a single CallbackManager that manages multiple TokenCountingHandler instances, each configured with the appropriate tokenizer for each model. Here is an example of how you can achieve this:

import tiktoken
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.core import Settings

# Create token counters for different models
token_counter_gpt4o = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-4o").encode,
    verbose=False,
)

token_counter_gemini = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gemini").encode,
    verbose=False,
)

# Create a single CallbackManager with both token counters
callback_manager = CallbackManager([token_counter_gpt4o, token_counter_gemini])

Settings.callback_manager = callback_manager

document = SimpleDirectoryReader("./data").load_data()

# Use the appropriate token counter for each model
index_gpt4o = VectorStoreIndex.from_documents(
    documents,
    callback_manager=CallbackManager([token_counter_gpt4o])
)

index_gemini = VectorStoreIndex.from_documents(
    documents,
    callback_manager=CallbackManager([token_counter_gemini])
)

# Access the counts directly
print("GPT-4o Embedding Tokens: ", token_counter_gpt4o.total_embedding_token_count)
print("Gemini Embedding Tokens: ", token_counter_gemini.total_embedding_token_count)

# Reset the counts at your discretion
token_counter_gpt4o.reset_counts()
token_counter_gemini.reset_counts()

# Track prompt, completion, and total LLM tokens for GPT-4o
response_gpt4o = index_gpt4o.as_query_engine().query("What did the author do growing up?")
print(
    "GPT-4o Embedding Tokens: ",
    token_counter_gpt4o.total_embedding_token_count,
    "\n",
    "GPT-4o LLM Prompt Tokens: ",
    token_counter_gpt4o.prompt_llm_token_count,
    "\n",
    "GPT-4o LLM Completion Tokens: ",
    token_counter_gpt4o.completion_llm_token_count,
    "\n",
    "GPT-4o Total LLM Token Count: ",
    token_counter_gpt4o.total_llm_token_count,
)

# Track prompt, completion, and total LLM tokens for Gemini
response_gemini = index_gemini.as_query_engine().query("What did the author do growing up?")
print(
    "Gemini Embedding Tokens: ",
    token_counter_gemini.total_embedding_token_count,
    "\n",
    "Gemini LLM Prompt Tokens: ",
    token_counter_gemini.prompt_llm_token_count,
    "\n",
    "Gemini LLM Completion Tokens: ",
    token_counter_gemini.completion_llm_token_count,
    "\n",
    "Gemini Total LLM Token Count: ",
    token_counter_gemini.total_llm_token_count,
)

This setup ensures that each TokenCountingHandler is properly recording the calls for its respective model [1][2][3][4].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

You'd have to attach each handler to the LLM

OpenAI(..., callback_manager=CallbackManager([mm_llm_token_counter])

Gemini(..., callback_manager=CallbackManager([llm_token_counter])