lavague-ai / LaVague

Large Action Model framework to develop AI Web Agents
https://docs.lavague.ai/en/latest/
Apache License 2.0
5.28k stars 470 forks source link

TokenCounter only supports OpenAI models #465

Closed paulpalmieri closed 1 month ago

paulpalmieri commented 1 month ago

Context

Impact

Notes

Potential solutions

paulpalmieri commented 1 month ago

Few issues here:

  1. instantiating two llamaindex TokenCountingHandler to count calls from different models doesn't seem to function. Only one of those catches events.
  2. In the case of passing a tokenizer from the vertexai.preview lib, we get this error inside llamaindex's token counting module: TypeError: 'Tokenizer' object is not callable

So it seems that our current approach using llama-index counter has a lot of limitations.

For 1, the whole point of creating several TokenCountingHandler was to pass an appropriate tokenizer to each. However since Google's tokenizer doesn't seem to work with llamaindex's token counting, we could:

@adeprez @dhuynh95 what do you think ?

paulpalmieri commented 1 month ago

Here's some tokenizer comparison:

prompt: 14517
--------------------
gpt: 4522
cl100k: 4680
o200k: 4522
p50k: 5925
r50k: 6082
gpt2: 6082
-> gemini flash: 5201
-> gemini pro 1.5: 5201

Since we can't support the real gemini tokenizer, what do you think about using gpt tokenizer and adding 15% to Gemini cost calculation ?

Code to run the comparison:

from lavague.drivers.selenium.base import SELENIUM_PROMPT_TEMPLATE
import tiktoken
from vertexai.preview import tokenization

enc_gpt_4o = tiktoken.encoding_for_model("gpt-4o")
enc_cl100k = tiktoken.get_encoding("cl100k_base") 
enc_o200k = tiktoken.get_encoding("o200k_base")
enc_p50k = tiktoken.get_encoding("p50k_base")
enc_r50k = tiktoken.get_encoding("r50k_base")
enc_gpt2 = tiktoken.get_encoding("gpt2")
enc_gemini_flash = tokenization.get_tokenizer_for_model("gemini-1.5-flash-001") # gemini tokenizer from vertex api
enc_gemini_pro = tokenization.get_tokenizer_for_model("gemini-1.5-pro-001") # gemini tokenizer from vertex api

print("prompt:",len(SELENIUM_PROMPT_TEMPLATE))
print("--------------------")
print("gpt:",len(enc_gpt_4o.encode(SELENIUM_PROMPT_TEMPLATE)))
print("cl100k:",len(enc_cl100k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("o200k:",len(enc_o200k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("p50k:",len(enc_p50k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("r50k:",len(enc_r50k.encode(SELENIUM_PROMPT_TEMPLATE)))
print("gpt2:",len(enc_gpt2.encode(SELENIUM_PROMPT_TEMPLATE)))
print("-> gemini flash:", enc_gemini_flash.count_tokens(SELENIUM_PROMPT_TEMPLATE).total_tokens)
print("-> gemini pro 1.5:", enc_gemini_pro.count_tokens(SELENIUM_PROMPT_TEMPLATE).total_tokens)
paulpalmieri commented 1 month ago

Temporary fix by PR: #467

lyie28 commented 1 month ago

Can we close this now then @paulpalmieri ?