mikeizbicki / modulus-magnus-linguae

8 stars 6 forks source link

Tokenizer library question #3

Closed alysawyer closed 1 year ago

alysawyer commented 1 year ago

Hi! Would tiktoken be a good library to utilize when counting gpt tokens? I tried to run it using this sentence from the Bible in Achuar. Tiktoken counts the sentence as being 41 tokens (not sure if that is a reasonable count for gpt-3.5) "Tura Judá Tamaran nuwatak, ni uchirin Faresan tura chikich uchirin Zara naartinun yajutmarmiayi." Thank you!