Open simonw opened 10 months ago
Anthropic have a tokenizer too: https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/_tokenizers.py
what if you don't know the origin of the model? all you have to go by is the name of the model.
is there baked-in metadata we can read that tells us what tokenizer to use?
So what exactly can we use for Claude models? E.g., Sonnet 3.5.
Had a great tip on Discord about
tokenziers
- which says: https://huggingface.co/docs/tokenizers/python/latest/quicktour.html#using-a-pretrained-tokenizerAnd sure enough, this seems to work: