karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MIT License
9.19k stars 866 forks source link

Handle error when running out of pairs to merge #54

Open vinhdq842 opened 8 months ago

vinhdq842 commented 8 months ago

Accidentally encountered a ValueError: max() arg is an empty sequence when attempting to test on a small piece of text with a (maybe) large vocab_size.