run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.86k stars 5.09k forks source link

[Bug]: I want check number of tokens used in my agent worker. #13714

Closed bhuvanweb3 closed 4 months ago

bhuvanweb3 commented 4 months ago

Documentation Issue Description

I want to get the whole token usage details for my agent workers.

Documentation Link

https://docs.llamaindex.ai/en/stable/examples/callbacks/TokenCountingHandler/

dosubot[bot] commented 4 months ago

Hello @bhuvanweb3! I'm here to help you with any questions or issues you have. If you need assistance, feel free to ask!

To check the number of tokens used in your agent worker for its full flow using the Token Counting Handler, you can use the properties provided by the TokenCountingHandler class. Specifically, you can access the total_llm_token_count, prompt_llm_token_count, and completion_llm_token_count properties to get the respective token counts.

Here is an example of how you can do this:

# Initialize the TokenCountingHandler
token_counting_handler = TokenCountingHandler(verbose=True)

# After running your agent worker's full flow, you can access the token counts
total_tokens = token_counting_handler.total_llm_token_count
prompt_tokens = token_counting_handler.prompt_llm_token_count
completion_tokens = token_counting_handler.completion_llm_token_count

print(f"Total LLM Token Count: {total_tokens}")
print(f"Prompt LLM Token Count: {prompt_tokens}")
print(f"Completion LLM Token Count: {completion_tokens}")

This code will print the total number of tokens, the number of tokens used in prompts, and the number of tokens used in completions.

Additionally, you can track the token counts for different event types such as LLM and Embedding events. Here is an example of how you can handle these events:

from llama_index.legacy.callbacks.schema import CBEventType
from llama_index.legacy.callbacks.token_counting import TokenCountingHandler

# Initialize the TokenCountingHandler
handler = TokenCountingHandler()

# Example payload
TEST_PAYLOAD = {"chunks": ["one"], "formatted_prompt": "two", "response": "three"}
TEST_ID = "my id"

# Start and end an LLM event
handler.on_event_start(CBEventType.LLM, payload=TEST_PAYLOAD, event_id=TEST_ID)
handler.on_event_end(CBEventType.LLM, payload=TEST_PAYLOAD, event_id=TEST_ID)

# Start and end an Embedding event
handler.on_event_start(CBEventType.EMBEDDING, payload=TEST_PAYLOAD, event_id=TEST_ID)
handler.on_event_end(CBEventType.EMBEDDING, payload=TEST_PAYLOAD, event_id=TEST_ID)

# Access the token counts
print(f"Total LLM Token Count: {handler.total_llm_token_count}")
print(f"Total Embedding Token Count: {handler.total_embedding_token_count}")

This code demonstrates how to handle the start and end of events and access the token counts for both LLM and Embedding events. The tests confirm that the LLM token count should be the sum of the prompt and response tokens, while the Embedding token count should reflect the number of tokens in the chunks.

For more advanced usage, you can track each token usage event in an object called a TokenCountingEvent. This object has attributes such as prompt, prompt_token_count, completion, completion_token_count, total_token_count, and event_id. These events are tracked on the token counter in two lists: llm_token_counts and embedding_token_counts.

Here is an example of how to explore these events:

print("Num LLM token count events: ", len(token_counter.llm_token_counts))
print("Num Embedding token count events: ", len(token_counter.embedding_token_counts))

# Example of accessing details of a single event
print("prompt: ", token_counter.llm_token_counts[0].prompt[:100], "...\n")
print("prompt token count: ", token_counter.llm_token_counts[0].prompt_token_count, "\n")
print("completion: ", token_counter.llm_token_counts[0].completion[:100], "...\n")
print("completion token count: ", token_counter.llm_token_counts[0].completion_token_count, "\n")
print("total token count", token_counter.llm_token_counts[0].total_token_count)

This will give you detailed insights into each token usage event, including the prompt and completion token counts.

For more information, you can refer to the LlamaIndex API documentation [1][2][3][4][5].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

bhuvanweb3 commented 4 months ago

image No matter what I do, I can't check tokens for my agent

"worker = FunctionCallingAgentWorker( tools=[ get_booking_state_tool, update_booking_tool, create_booking_tool, confirm_booking_tool, ], llm=llm, prefix_messages=prefix_messages, max_function_calls=10, allow_parallel_tool_calls=False, callback_manager=CallbackManager([token_counter]), verbose=True)

agent = AgentRunner(worker)"

dosubot[bot] commented 4 months ago

To ensure that the TokenCountingHandler is correctly tracking and displaying token usage for your FunctionCallingAgentWorker, follow these steps:

  1. Initialize the TokenCountingHandler and CallbackManager: Make sure you have initialized the TokenCountingHandler correctly and passed it to the CallbackManager.

    import tiktoken
    from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
    from llama_index.llms.openai import OpenAI
    from llama_index.core import Settings
    
    # Initialize the token counter
    token_counter = TokenCountingHandler(
       tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
       verbose=True  # Enable verbose mode for detailed logging
    )
    
    # Set global settings
    Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)
    Settings.callback_manager = CallbackManager([token_counter])
  2. Ensure Proper Event Handling: Verify that the FunctionCallingAgentWorker is correctly handling events. The CallbackManager should be used to wrap function calls and capture relevant payloads.

    from llama_index.core.agent.function_calling.step import FunctionCallingAgentWorker
    from llama_index.core.agent.runner.base import AgentRunner
    
    worker = FunctionCallingAgentWorker(
       tools=[
           get_booking_state_tool,
           update_booking_tool,
           create_booking_tool,
           confirm_booking_tool,
       ],
       llm=llm,
       prefix_messages=prefix_messages,
       max_function_calls=10,
       allow_parallel_tool_calls=False,
       callback_manager=CallbackManager([token_counter]),
       verbose=True
    )
    
    agent = AgentRunner(worker)
  3. Run Your Agent and Track Token Usage: After running your agent's full flow, you can access the token counts using the properties provided by the TokenCountingHandler.

    # Run your agent's full flow here
    # ...
    
    # Access the token counts
    total_tokens = token_counter.total_llm_token_count
    prompt_tokens = token_counter.prompt_llm_token_count
    completion_tokens = token_counter.completion_llm_token_count
    
    print(f"Total LLM Token Count: {total_tokens}")
    print(f"Prompt LLM Token Count: {prompt_tokens}")
    print(f"Completion LLM Token Count: {completion_tokens}")
  4. Debugging and Logging: If the token counts are still zero, ensure that the events are being correctly started and ended. You can add additional logging to verify that the events are being captured.

    from llama_index.core.callbacks.schema import CBEventType
    
    # Example payload for an LLM event
    payload = {
       "chunks": ["one"],
       "formatted_prompt": "What is the capital of France?",
       "response": "The capital of France is Paris."
    }
    event_id = "event_1"
    
    # Simulate the start of an LLM event
    token_counter.on_event_start(CBEventType.LLM, payload=payload, event_id=event_id)
    
    # Simulate the end of an LLM event
    token_counter.on_event_end(CBEventType.LLM, payload=payload, event_id=event_id)
    
    # Check token counts
    print(f"Total LLM Token Count: {token_counter.total_llm_token_count}")
    print(f"Prompt LLM Token Count: {token_counter.prompt_llm_token_count}")
    print(f"Completion LLM Token Count: {token_counter.completion_llm_token_count}")

By following these steps, you should be able to correctly track and display the token usage for your agent worker using the TokenCountingHandler class [1][2][3].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.