karpathy / build-nanogpt

Video+code lecture on building nanoGPT from scratch
3.44k stars 473 forks source link

NO dropout in MLP and CausalSelfAttention #29

Closed peter-ni-noob closed 3 weeks ago

unclecode commented 3 months ago

Yes, he elaborated on this topic in his video. Overfitting is not a major concern for this project overall because the dataset is virtually infinite. Even after 4 epochs and 40 billion tokens, the validation loss is decreasing steadily.

peter-ni-noob commented 3 weeks ago

ok