Open aidando73 opened 1 week ago
Note that I didn't run python train_gpt2.py beforehand.
When I was using traing_gpt2.cu for inference, I ran into the same issue. But if I ran python train_gpt2.py
beforehand I no longer ran into the issue.
My hypothesis is that -1149026846
is the end of file token that we're not setting correctly for the case where we don't run python train_gpt2.py
.
I'm trying to follow https://github.com/karpathy/llm.c/discussions/481 but I'm getting this error:
Happens at the end of training. I don't end up getting the final model weights.
Running:
You can find the 1500 model checkpoint + state here: https://huggingface.co/aidando73/repro-gpt-2-124M/tree/086c8895ae49f2472bcde14c7866e792b0a330f1/8x_A100_40GB/log124M
Commit hash I checked out: 7ecd8906afe6ed7a2b2cdb731c042f26d525b820
Note that I didn't run
python train_gpt2.py
beforehand.Anyone else getting this error?