run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.53k stars 5.23k forks source link

[Bug]: Token Counter is not working when transformations are applied with an embedding model. #13146

Closed dsanr closed 3 months ago

dsanr commented 6 months ago

Bug Description

Token Counter is not working when transformations are applied along with an embedding model. So, even the Ingestion Pipeline is not useful if we want to use the Token Counter.

Version

0.10.33

Steps to Reproduce

documents = SimpleDirectoryReader(input_dir="./data", filename_as_id=True).load_data(show_progress=True) token_counter = TokenCountingHandler(tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode, verbose=True) Settings.callback_manager = CallbackManager([token_counter]) transformations = [SentenceSplitter(chunk_size=512, chunk_overlap=128), OpenAIEmbedding()] index = VectorStoreIndex.from_documents(documents, transformations=transformations) token_counter.total_embedding_token_count is returning zero. If I remove OpenAIEmbedding() from transformations and pass it in VectorStoreIndex.from_documents(), then token counter is working.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 6 months ago

It seems the issue with total_embedding_token_count returning zero when using transformations alongside an OpenAIEmbedding model might stem from how embedding events and their tokens are handled. Specifically, if the embedding transformation doesn't generate or populate EventPayload.CHUNKS as expected, or if the TokenCountingHandler isn't processing these chunks correctly, it could lead to the observed behavior.

To address this, ensure the payload for embedding events includes EventPayload.CHUNKS with the correct data. Confirm that embedding events are properly identified and processed in the on_event_end method, particularly that chunks are being processed and token counts are accurately calculated for each chunk. Additionally, verify that any transformations applied before token counting do not alter the data in a way that affects the token counting logic.

Reviewing the integration of the embedding model and transformations to maintain the expected data structure and flow is crucial. Debugging with logging or debugging statements within the handling of embedding events can help identify where the process might be failing.

By closely examining these areas, you should be able to pinpoint and resolve the issue causing the token counter to report zero in the presence of embedding transformations.

#### Details **Sources** - [llama-index-legacy/llama_index/legacy/callbacks/token_counting.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/callbacks/token_counting.py) - [docs/docs/examples/callbacks/TokenCountingHandler.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/callbacks/TokenCountingHandler.ipynb) - [llama-index-legacy/llama_index/legacy/embeddings/openai.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/embeddings/openai.py) - [llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/base.py) - [llama-index-core/llama_index/core/callbacks/schema.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/callbacks/schema.py) - [llama-index-core/llama_index/core/instrumentation/events/embedding.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/instrumentation/events/embedding.py) - [llama-index-legacy/llama_index/legacy/utilities/token_counting.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/utilities/token_counting.py) - [llama-index-core/llama_index/core/utilities/token_counting.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/utilities/token_counting.py) - [llama-index-core/llama_index/core/callbacks/token_counting.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/callbacks/token_counting.py) - [docs/docs/module_guides/observability/callbacks/token_counting_migration.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/observability/callbacks/token_counting_migration.md) - [llama-index-core/tests/callbacks/test_token_counter.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/tests/callbacks/test_token_counter.py) - [llama-index-legacy/tests/callbacks/test_token_counter.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/tests/callbacks/test_token_counter.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich commented 6 months ago

@dsanr you should attach the callback manager to the embedding model (and also the LLM tbh)

OpenAIEmbedding(..., callback_manager=callback_manager)
dsanr commented 6 months ago

@logan-markewich Shouldn't be using Settings.callback_manager work? Documentation of token_counter also has this https://docs.llamaindex.ai/en/stable/examples/callbacks/TokenCountingHandler/