Does the init in a single loop and saves a loop if there are no merges
Simplifies get_rank and no longer uses it in init (so you don't need multiple skip values)
Unlike that commit:
We drop optimisations enabled by ignoring single tokens. These didn't show any benefit on benchmarks for me (this makes sense given typical piece sizes, but let me know if that's unexpected!). Given this, I opted for the simpler version.
I preserve some of the comments from the original that I think are still useful
Let me know what you think! Once we figure this one out, we'll look at the linearithmic fix (I have some thoughts there, still doing some benchmarking).
Based on suggestion in https://github.com/openai/tiktoken/pull/239 (specifically 8f5dd7d)
Like that commit, this:
Unlike that commit:
Let me know what you think! Once we figure this one out, we'll look at the linearithmic fix (I have some thoughts there, still doing some benchmarking).
Co-authored-by: @paplorinc