TokenCounter only supports OpenAI models

paulpalmieri commented 1 month ago

Context

To listen to API calls and count tokens we use TokenCountingHandler from llama-index
TokenCountingHandler class requires a tokenizer.
We currently use tiktoken as our only tokenizer.
tiktoken only supports OpenAI models.

Impact

All token counting and cost estimation is currently limited to OpenAI models supported by both tiktoken and llama-index

Notes

There seem to be no general purpose tokenizers out there.
There is vertexai.preview.tokenization for Gemini

Potential solutions

Let user define the tokenizer
- Only the following models are supported Supported models: gemini-1.0-pro-001, gemini-1.0-pro-002, gemini-1.5-pro-001, gemini-1.5-flash-001.

paulpalmieri commented 1 month ago

Few issues here:

instantiating two llamaindex TokenCountingHandler to count calls from different models doesn't seem to function. Only one of those catches events.
In the case of passing a tokenizer from the vertexai.preview lib, we get this error inside llamaindex's token counting module: TypeError: 'Tokenizer' object is not callable

So it seems that our current approach using llama-index counter has a lot of limitations.

For 1, the whole point of creating several TokenCountingHandler was to pass an appropriate tokenizer to each. However since Google's tokenizer doesn't seem to work with llamaindex's token counting, we could:

keep the current implementation of LaVague's TokenCounter (only one registers all LLM calls)
pass a default tokenizer all the time (cl100k_base ?)
compute pricing based on the llm/mm_llm

@adeprez @dhuynh95 what do you think ?

paulpalmieri commented 1 month ago

Here's some tokenizer comparison:

prompt: 14517
--------------------
gpt: 4522
cl100k: 4680
o200k: 4522
p50k: 5925
r50k: 6082
gpt2: 6082
-> gemini flash: 5201
-> gemini pro 1.5: 5201

Since we can't support the real gemini tokenizer, what do you think about using gpt tokenizer and adding 15% to Gemini cost calculation ?

Code to run the comparison:

from lavague.drivers.selenium.base import SELENIUM_PROMPT_TEMPLATE
import tiktoken
from vertexai.preview import tokenization

enc_gpt_4o = tiktoken.encoding_for_model("gpt-4o")
enc_cl100k = tiktoken.get_encoding("cl100k_base") 
enc_o200k = tiktoken.get_encoding("o200k_base")
enc_p50k = tiktoken.get_encoding("p50k_base")
enc_r50k = tiktoken.get_encoding("r50k_base")
enc_gpt2 = tiktoken.get_encoding("gpt2")
enc_gemini_flash = tokenization.get_tokenizer_for_model("gemini-1.5-flash-001") # gemini tokenizer from vertex api
enc_gemini_pro = tokenization.get_tokenizer_for_model("gemini-1.5-pro-001") # gemini tokenizer from vertex api

print("prompt:",len(SELENIUM_PROMPT_TEMPLATE))
print("--------------------")
print("gpt:",len(enc_gpt_4o.encode(SELENIUM_PROMPT_TEMPLATE)))
print("cl100k:",len(enc_cl100k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("o200k:",len(enc_o200k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("p50k:",len(enc_p50k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("r50k:",len(enc_r50k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("gpt2:",len(enc_gpt2.encode(SELENIUM_PROMPT_TEMPLATE)))
print("-> gemini flash:", enc_gemini_flash.count_tokens(SELENIUM_PROMPT_TEMPLATE).total_tokens)
print("-> gemini pro 1.5:", enc_gemini_pro.count_tokens(SELENIUM_PROMPT_TEMPLATE).total_tokens)

paulpalmieri commented 1 month ago

Temporary fix by PR: #467

we'll use a default tokenizer and approximate Gemini tokens with a multiplier defined in the pricing config.

lyie28 commented 1 month ago

Can we close this now then @paulpalmieri ?

lavague-ai / LaVague