run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.36k stars 5.19k forks source link

How I measure the tokens for Chain of table code snippet? #12588

Closed mraguth closed 1 month ago

mraguth commented 6 months ago

Question Validation

Question

I am using this below code snippet from the website: https://github.com/run-llama/llama-hub/blob/main/llama_hub/llama_packs/tables/chain_of_table/chain_of_table.ipynb

from llama_index.prompts import PromptTemplate
from llama_index.query_pipeline import QueryPipeline

prompt_str = """\
Here's a serialized table.

{serialized_table}

Given this table please answer the question: {question}
Answer: """
prompt = PromptTemplate(prompt_str)
prompt_c = prompt.as_query_component(partial={"serialized_table": serialize_table(df)})

qp = QueryPipeline(chain=[prompt_c, llm])
response = qp.run("What was the precipitation in inches during June?")
print(str(response))

To measure tokens I am using code below from this URL: https://docs.llamaindex.ai/en/stable/examples/callbacks/TokenCountingHandler/?h=token

However, I am getting zero tokens all the time. What I am doing wrong? please help me.

print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)
dosubot[bot] commented 6 months ago

Hey @mraguth, good to see you back with another intriguing puzzle for us to solve! Hope you're doing well.

Based on the information you've provided and the context from similar issues, it seems like the problem might be related to the initialization or usage of the TokenCounter class or the structure of the payloads passed to the get_llm_token_counts function.

In a similar issue [Bug]: Token count results for prompts are always zero., the problem was due to the overwriting of the TokenCountingHandler, which led to incorrect results. The user resolved the issue by saving the instance of TokenCountingHandler in a session state object.

In your case, it's important to ensure that the payloads passed to get_llm_token_counts contain the correct structure and data expected by the function. Also, the TokenCounter class and its methods (get_string_tokens, estimate_tokens_in_messages) should be correctly implemented and return accurate token counts for given strings or messages. The tokenizer used by TokenCounter should be correctly initialized and capable of counting tokens accurately for the inputs it receives.

Here's a brief overview of how the TokenCounter class is implemented in the LlamaIndex codebase:

  1. Initialization (__init__): Initializes the TokenCounter instance with a tokenizer. If no tokenizer is provided, a default one is obtained using get_tokenizer().

  2. get_string_tokens: Counts the number of tokens in a given string using the stored tokenizer.

  3. estimate_tokens_in_messages: Estimates the token count for a list of chat messages. It iterates over each message, counting tokens for the message's role, content, and any additional function calls specified in additional_kwargs.

  4. estimate_tokens_in_functions: Estimates the token count for a list of function definitions. It converts the list of function definitions to a string and counts the tokens in this string representation.

You can find more details about the TokenCounter class in the llama-index-core/llama_index/core/utilities/token_counting.py file.

If you're still encountering the issue, could you provide more details about how you're initializing and using the TokenCounter class and the structure and content of the payloads you're passing to get_llm_token_counts? This information will help in diagnosing the issue more accurately.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
mraguth commented 6 months ago

No luck. It is working for other queries. But if I used for 'qp = QueryPipeline(chain=[prompt_c, llm])' it does not work. Need help.

logan-markewich commented 6 months ago

Attach the token counter as a callback manager to your LLM?

callback_manager = CallbackManager([token_counter])

llm = OpenAI(..., callback_manager=callback_manager)
mraguth commented 6 months ago

I tried. Still it is showing as completed tokens = 0 ...this is my code:

token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-35-turbo-16k").encode)
Settings.callback_manager = CallbackManager([token_counter])

embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
    CallbackManager=Settings.callback_manager,
)

llm = AzureOpenAI(
    model="gpt-35-turbo-16k",
    deployment_name="gpt-35-turbo-16k",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
    temperature=0,
    CallbackManager=Settings.callback_manager,
)
vignesh-arivazhagan commented 4 months ago

@mraguth you made a typo lowercase for callback_manager, working fine for me, don't consider the decorator that i am using it is a custom implementation

image