Closed paulpalmieri closed 1 month ago
Few issues here:
TokenCountingHandler
to count calls from different models doesn't seem to function. Only one of those catches events. vertexai.preview
lib, we get this error inside llamaindex's token counting module: TypeError: 'Tokenizer' object is not callable
So it seems that our current approach using llama-index counter has a lot of limitations.
For 1, the whole point of creating several TokenCountingHandler
was to pass an appropriate tokenizer to each. However since Google's tokenizer doesn't seem to work with llamaindex's token counting, we could:
TokenCounter
(only one registers all LLM calls)cl100k_base
?)@adeprez @dhuynh95 what do you think ?
Here's some tokenizer comparison:
prompt: 14517
--------------------
gpt: 4522
cl100k: 4680
o200k: 4522
p50k: 5925
r50k: 6082
gpt2: 6082
-> gemini flash: 5201
-> gemini pro 1.5: 5201
Since we can't support the real gemini tokenizer, what do you think about using gpt
tokenizer and adding 15% to Gemini cost calculation ?
Code to run the comparison:
from lavague.drivers.selenium.base import SELENIUM_PROMPT_TEMPLATE
import tiktoken
from vertexai.preview import tokenization
enc_gpt_4o = tiktoken.encoding_for_model("gpt-4o")
enc_cl100k = tiktoken.get_encoding("cl100k_base")
enc_o200k = tiktoken.get_encoding("o200k_base")
enc_p50k = tiktoken.get_encoding("p50k_base")
enc_r50k = tiktoken.get_encoding("r50k_base")
enc_gpt2 = tiktoken.get_encoding("gpt2")
enc_gemini_flash = tokenization.get_tokenizer_for_model("gemini-1.5-flash-001") # gemini tokenizer from vertex api
enc_gemini_pro = tokenization.get_tokenizer_for_model("gemini-1.5-pro-001") # gemini tokenizer from vertex api
print("prompt:",len(SELENIUM_PROMPT_TEMPLATE))
print("--------------------")
print("gpt:",len(enc_gpt_4o.encode(SELENIUM_PROMPT_TEMPLATE)))
print("cl100k:",len(enc_cl100k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("o200k:",len(enc_o200k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("p50k:",len(enc_p50k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("r50k:",len(enc_r50k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("gpt2:",len(enc_gpt2.encode(SELENIUM_PROMPT_TEMPLATE)))
print("-> gemini flash:", enc_gemini_flash.count_tokens(SELENIUM_PROMPT_TEMPLATE).total_tokens)
print("-> gemini pro 1.5:", enc_gemini_pro.count_tokens(SELENIUM_PROMPT_TEMPLATE).total_tokens)
Temporary fix by PR: #467
Can we close this now then @paulpalmieri ?
Context
TokenCountingHandler
fromllama-index
TokenCountingHandler
class requires atokenizer
.tiktoken
as our only tokenizer.tiktoken
only supports OpenAI models.Impact
tiktoken
andllama-index
Notes
vertexai.preview.tokenization
for GeminiPotential solutions
tokenizer
Supported models: gemini-1.0-pro-001, gemini-1.0-pro-002, gemini-1.5-pro-001, gemini-1.5-flash-001.