karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
36.83k stars 5.83k forks source link

Resume Training #467

Open tiredsoul21 opened 6 months ago

tiredsoul21 commented 6 months ago

https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436728a420e85796c57dba7e/train.py#L106

In my implementation of the code, I modified this line to incorporate the iteration into the seed. I suspect that if you resume training multiple times, the random seed may draw the same training sets and in the same order. Over many iterations of the dataset this may be lost, but it may cause a pattern of validating against the same sets as well, which has higher consequences.

VatsaDev commented 6 months ago

the random seed may draw the same training sets and in the same order

That would be amazingly rare wouldn't it?

tiredsoul21 commented 6 months ago

I would say given the setup described no. Sampling out to be deterministic over multiple resumes...right?

VatsaDev commented 6 months ago

Oh Wait you're using a manual seed, sry I misread manual seed, well that should load the batches in a certain order, so yes it could do multiple epochs in the same order, You could remove the seed for true randomness, or do the one line change you made, It would be a quick PR if Andrej sees it, but its not already a feature probably because its an uncommon scenario.

Have you experienced it yet by the way? If the batches are being loaded in the same order, shouldn't it start showing as a quick drop on the loss graph, which slowly rises again?