Comparing Triton vs XeTLA FlashAttention output in FlashAttention using atol=1e-2, rtol=0 as in upstream leads to size 1 32 16384 64 missing verification. A more relaxed atol=1e-1 value verifies, but this might be a bit too permissive taking into account values will be less than 1 anyway (FlashAttention is a SoftMax).
In order to reproduce, add the following code to the forward function, right before the return:
Comparing Triton vs XeTLA FlashAttention output in FlashAttention using
atol=1e-2, rtol=0
as in upstream leads to size1 32 16384 64
missing verification. A more relaxedatol=1e-1
value verifies, but this might be a bit too permissive taking into account values will be less than 1 anyway (FlashAttention is a SoftMax).In order to reproduce, add the following code to the
forward
function, right before thereturn
: