knuddelsgmbh / jtokkit

JTokkit is a Java tokenizer library designed for use with OpenAI models.
https://jtokkit.knuddels.de/
MIT License
516 stars 38 forks source link

Add Encoding.calcCharCountForTokens method #91

Open dimafa opened 3 months ago

dimafa commented 3 months ago

A very common use case for token counting is when chunking a long text to fit in a model context window. In order to efficiently use jtokkit library for this purpose, we need to be able to count number of characters for given token count. I added Encoding.calcCharCountForTokens method that does that. Please, review and accept the pull request if it makes sense. I updated the implementation for the latest code.