zurawiki / tiktoken-rs

Ready-made tokenizer library for working with GPT and tiktoken
MIT License
240 stars 46 forks source link

Use case: splitting text into tokens. #16

Closed jackbackes closed 1 year ago

jackbackes commented 1 year ago

As far as I am able to tell, tiktoken-rs is able to encode into a vector of token references, and then you can decode those token references back into your original text. However my use case is to split my text into decoded tokens. I don't see a way to do this via the api. Having a method like "split_by_tokens" would be super helpful.