karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MIT License
9.2k stars 866 forks source link

calling len(ids) in merge() function only once to increase performance #76

Open crpatil1901 opened 6 months ago

crpatil1901 commented 6 months ago

The length of input ids is not changing inside the merge() function. Instead of calling len(ids) in every iteration of the while loop, storing it in a variable at the beginning of the loop can help shave of few milliseconds while dealing with large documents.