simonw / ttok

Count and truncate text based on tokens
Apache License 2.0
247 stars 7 forks source link

Initial design #1

Closed simonw closed 1 year ago

simonw commented 1 year ago

It's a tool for piping in text and either counting the tokens or truncating them.

Using tiktoken - documented here: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

Idea from:

simonw commented 1 year ago

Design:

$ ttok one two three
3
$ ttok one two three --truncate 2
one two

And a -m option to specify different models.

simonw commented 1 year ago

Spotted this model count difference, I can use this in a test:

$ ttok boo hello there this is                                  
5
$ ttok boo hello there this is -m gpt2
6
simonw commented 1 year ago

I'm adding --tokens to output the integer tokens:

% echo "hello world" | ttok --tokens
15339 1917 198