Open karpathy opened 4 months ago
just creating a todo. large batch sizes work now having fixed the size_t bug:
size_t
./train_gpt2cu -b 36 -v 200 -s 200 -i data/TinyStories
works, but 48 should fit but doesn't work
./train_gpt2cu -b 48 -v 200 -s 200 -i data/TinyStories
val loss is -nan and train loss stays at inf.
todo track down why and how to prevent
@karpathy just wanted to check, we've fixed this, right?
just creating a todo. large batch sizes work now having fixed the
size_t
bug:works, but 48 should fit but doesn't work
val loss is -nan and train loss stays at inf.
todo track down why and how to prevent