Stricter FP32 tests - Githubissues

gordicaleksa commented 1 week ago

Stricter FP32 logit accuracy
Much stricter FP32 loss accuracy
Much stricter FP32 grad tensor accuracy (and somewhat stricter 16 bit accuracy)
Copied over new expected loss values from PyTorch (they're a bit different and I took all 6 decimal points and not just 4)
Also adapted our test logic to round loss to 6 decimal points

Regarding grad tensors: back when Andrej hardcoded the thresholds we had a bug in PyTorch that led to a bigger discrepancy between our PT vs C code - now that that's fixed we can be really strict and use 1e-6f here.

rosslwheeler commented 1 week ago

@gordicaleksa - this appears to be causing a failure when running:

make testgpt2_cu USE_CUDNN=1 && ./testgpt2_cu

It may or may not be seen on your environment but am able to see it here.

@karpathy - I think this is what was causing the issue. I ran my tests one commit back from this in your repo and it passes consistently. If I checkout this commit, then the failures start. Not 100 percent sure but it does make some sense since this is the test that's failing and there aren't many other changes to this file recently?

Can you confirm since you were seeing the failure consistently too? Thank you.

gordicaleksa commented 1 week ago

It's certainly this PR - sad our CI didn't catch this! See https://github.com/karpathy/llm.c/pull/615 for a fix.

karpathy / llm.c

Stricter FP32 tests #614