karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
21.28k stars 2.31k forks source link

Make cuDNN deterministic for Flash Attention backward #652

Closed ademeure closed 3 days ago

ademeure commented 3 days ago

cuDNN Frontend 1.5 released on June 13th added a new setting to make their backward algorithm deterministic which is disabled by default:

Testing on cuDNN Backend 9.2 + cuDNN Frontend 1.4 (where that setting is not available) revealed that cuDNN Backward was not deterministic at large batch sizes. As small batch sizes seem to always be deterministic, this would not have been caught by our test_gpt2cu tests.

This is despite NVIDIA documentation implying that it should have been deterministic, as their lists of non-deterministic cases did not include attention backward. There is a remark in the cuDNN Backward release notes about non-determinism for attention and RNNs, but it is only for the case where multiple streams call cuBLAS and/or cuDNN which is not the case for us, and setting CUBLAS_WORKSPACE_CONFIG as suggested in those notes did not make the result deterministic.

This PR should be combined with an upgrade to cuDNN Frontend 1.5 (it will compile without it but not fix the issue).