zurawiki / tiktoken-rs

Ready-made tokenizer library for working with GPT and tiktoken
MIT License
240 stars 47 forks source link

Try to fix token counting perf. #82

Open xd009642 opened 1 month ago

xd009642 commented 1 month ago

On our system we're seeing that any calls to num_tokens_from_messages is taking around 300-500ms consistently never mind the size of the message. Which is crazily slow for what it's doing. Upon checking a flamegraph we saw the model was being loaded each time so this attempts to fix that by using the singleton model instances and seeing if that can improve performance.

I'm a bit sceptical on this as the mutex locking might just cause it's own issues but we'll see in our testing and maybe come up with a solution going forwards...

As an aside given the vocab is known up front for a tokeniser you should be able to just codegen a hashmap if you wanted hashmap inserts completely dominate the num_tokens_from_messages.

Experiment in service of #81