intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
126 stars 36 forks source link

FlashAttention tutorial requires relaxed verification in advanced path (`perf_attn`) #2098

Open victor-eds opened 2 weeks ago

victor-eds commented 2 weeks ago

Comparing Triton vs XeTLA FlashAttention output in FlashAttention using atol=1e-2, rtol=0 as in upstream leads to size 1 32 16384 64 missing verification. A more relaxed atol=1e-1 value verifies, but this might be a bit too permissive taking into account values will be less than 1 anyway (FlashAttention is a SoftMax).

In order to reproduce, add the following code to the forward function, right before the return:

torch_output = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False).to(torch.float32)
torch.testing.assert_close(o, torch_output, atol=1e-2, rtol=0)
Dewei-Wang-sh commented 1 week ago

some different IR from poc, need to check they are the same.