openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.52k stars 856 forks source link

make decoder and sorted_token_bytes re-use existing memory #352

Open tmm1 opened 1 month ago

tmm1 commented 1 month ago

uses unsafe + std::mem::transmute to re-use encoder keys as decoder values and in the sorted_token_bytes list

this should be safe because all these objects share a lifetime inside CoreBPE

results in memory savings, and performance improvements in some scenarios