run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.39k stars 4.68k forks source link

[Question]: Further explanation of total embedding token usage #14090

Open ihgumilar opened 3 weeks ago

ihgumilar commented 3 weeks ago

Question Validation

Question

Hi,

I have used this code https://docs.llamaindex.ai/en/stable/examples/callbacks/TokenCountingHandler/ to count for embedding token usage. However, I would like to get a further clarification or explanation do you have any specific breakdown about total embedding token usage in LlamaIndex ?

Does it mean the usage token for query and answer ? Further explanation is much appreciated about embedding token usage in LlamaIndex.

Thanks

logan-markewich commented 3 weeks ago

I'm not sure what you are looking for?

Indexing uses tokens to embed all your documents.

Querying uses much less, as it only needs to embed the query text

ihgumilar commented 3 weeks ago

Thanks for the explanation. Basically, I am trying to understand this image

What does 8 (embedding tokens) represent ?

logan-markewich commented 3 weeks ago

It means to run the query, 8 tokens were sent to the embedding model (I.e it sent the query text to the embedding model)