simonw / ttok

Count and truncate text based on tokens
Apache License 2.0
247 stars 7 forks source link

Add allowed-special #12

Closed FergusFettes closed 3 months ago

FergusFettes commented 7 months ago

I got an error cause a file I was trying to count the tokens of contained some special tokens. This is a quick fix for that. Happy to make some modifications or write a test if you are interested in this change.

Before:

echo '<|endoftext|>' | ttok
> Long tiktoken error

Now:

echo '<|endoftext|>' | ttok --allowed-special='<|endoftext|>'
2
simonw commented 3 months ago

Thanks for the PR! I ended up solving this a slightly different way: