karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MIT License
9.19k stars 866 forks source link

Add pyproject toml #89

Open gianlucagiudice opened 2 months ago

gianlucagiudice commented 2 months ago

Hi @karpathy,

Thank you for all the incredible educational projects you're working on. They've been a great source of learning and inspiration!

I'm currently working on my own GPT implementation, YaGPT (Yet Another GPT), and I would love to integrate your minbpe implementation for training the tokenizer for my model.

This PR makes it possible to install minbpe directly using pip with the following command: pip install git+https://github.com/karpathy/minbpe.git#egg=minbpe

Thank you again for your time and consideration!