openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.98k stars 816 forks source link

Simplify byte_pair_merge #255

Closed hauntsaninja closed 7 months ago

hauntsaninja commented 7 months ago

Based on suggestion in https://github.com/openai/tiktoken/pull/239 (specifically 8f5dd7d)

Like that commit, this:

Unlike that commit:

Let me know what you think! Once we figure this one out, we'll look at the linearithmic fix (I have some thoughts there, still doing some benchmarking).

Co-authored-by: @paplorinc

hauntsaninja commented 7 months ago

Thank you for the original change and follow-up review! I've marked you as co-author on the commit :-)