Make cuDNN deterministic for Flash Attention backward

cuDNN Frontend 1.5 released on June 13th added a new setting to make their backward algorithm deterministic which is disabled by default:

https://github.com/NVIDIA/cudnn-frontend/commit/47d800ccd9449e1bbc255d64d794ae88d99b043d
https://github.com/NVIDIA/cudnn-frontend/blob/main/docs/operations/Attention.md

Testing on cuDNN Backend 9.2 + cuDNN Frontend 1.4 (where that setting is not available) revealed that cuDNN Backward was not deterministic at large batch sizes. As small batch sizes seem to always be deterministic, this would not have been caught by our test_gpt2cu tests.

This is despite NVIDIA documentation implying that it should have been deterministic, as their lists of non-deterministic cases did not include attention backward. There is a remark in the cuDNN Backward release notes about non-determinism for attention and RNNs, but it is only for the case where multiple streams call cuBLAS and/or cuDNN which is not the case for us, and setting CUBLAS_WORKSPACE_CONFIG as suggested in those notes did not make the result deterministic.

This PR should be combined with an upgrade to cuDNN Frontend 1.5 (it will compile without it but not fix the issue).

karpathy / llm.c

Make cuDNN deterministic for Flash Attention backward #652