[Bug]: Token counter not working for RaptorPack

Bug Description

I recently conduced a few experiments using RaptorPack and everything looks fine. The only fallback is that the token counter is not working for the RaptorPack so that I cannot get an accurate metric on how many tokens are consumed. Neither the embedding token counter or the llm counter works. The counter always remains 0.

I tried the following and it worked for me for a single llm with token counter as 10. I noticed that the token counter remains 0 if I change

token_counter = TokenCountingHandler(tokenizer=tiktoken.encoding_for_model("gpt-4-1106-preview").encode)
callbackManager = CallbackManager([token_counter])
llm=OpenAI(model="gpt-4-1106-preview", temperature=0, callback_manager=CallbackManager([token_counter]))
response = llm.complete("hello")
token_counter.total_llm_token_count

Did I use the callback manager and token counter in an incorrect way?

Version

0.10.6

Steps to Reproduce

import tiktoken from llama_index.core.callbacks import CallbackManager, TokenCountingHandler embedding_token_counter = TokenCountingHandler(tokenizer=tiktoken.encoding_for_model("text-embedding-ada-002").encode) token_counter = TokenCountingHandler(tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode) callbackManager = CallbackManager([token_counter])

custom_pack_google_doc_tree = RaptorPack( documents=llama_index_documents, embed_model=OpenAIEmbedding( model="text-embedding-ada-002", callback_manager=CallbackManager([embedding_token_counter]) ), # used for embedding clusters llm=OpenAI(model="gpt-3.5-turbo", temperature=0, callback_manager=callbackManager), # used for generating summaries vector_store=vector_store, # used for storage similarity_top_k=5, # top k for each layer, or overall top-k for collapsed mode="tree_traversal", # sets default mode transformations=[ SentenceSplitter(chunk_size=400, chunk_overlap=50) ], # transformations applied for ingestion summary_prompt=RAPTOR_SUMMARY_PROMPT, )

Relevant Logs/Tracbacks

No response

Hey there, @mw19930312! Great to see you back with us. Diving into another challenge, I see? Let's get to the bottom of this together.

Thank you for providing the detailed information. From the code snippet you've shared, it seems like you're using the TokenCountingHandler and CallbackManager correctly. However, the issue you're experiencing with the token counter always remaining at 0 might be due to a few potential reasons. Here are some areas you could investigate further:

Event Handling: Ensure that the events of interest (CBEventType.LLM and CBEventType.EMBEDDING) are correctly being passed to the on_event_end method. If these events are not being triggered or captured correctly, the token counting logic within on_event_end will not execute.
Payload Content: Verify that the payloads passed to the on_event_end method contain the expected keys and values. If the payload structure has changed or if the expected keys (EventPayload.PROMPT, EventPayload.COMPLETION, EventPayload.MESSAGES, EventPayload.RESPONSE) are missing, the token counting logic may fail to execute correctly.
Verbose Logging: The code snippet includes a verbose logging option. If debugging has not already been attempted with verbose logging enabled (verbose=True in the TokenCountingHandler constructor), doing so could provide insights into whether the token counting methods are being called and what values they are computing.
Reset Counts: The reset_counts method resets the token counts to empty lists. Ensure that this method is not being called inadvertently before the token counts are being accessed.
Token Counter Implementation: The actual token counting is delegated to the TokenCounter class (self._token_counter). Ensure that the TokenCounter class and its methods (get_string_tokens, estimate_tokens_in_messages) are correctly implemented and returning the expected token counts.

Without more specific details on how the TokenCountingHandler is being used within the RaptorPack or additional logs/output showing where the process might be failing, it's challenging to pinpoint the exact cause of the issue. Reviewing the areas mentioned above and adding more detailed logging around the token counting process could help identify the root cause.

For more information, you can refer to the source code of the TokenCountingHandler class in the LlamaIndex codebase here.

Please let me know if you need further assistance or if you have any other questions.

Sources

llama-index-core/llama_index/core/callbacks/token_counting.py

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

run-llama / llama_index