karpathy / build-nanogpt

Video+code lecture on building nanoGPT from scratch
3.44k stars 473 forks source link

How to support padding in the train dataset for training ? #49

Open mrhimanshu opened 3 months ago

dustinwloring1988 commented 2 months ago

I would just add it in the fineweb.py script when you are tokenizing the rows.

dustinwloring1988 commented 2 months ago

@mrhimanshu sorry forgot to tag you