Closed Birch-san closed 3 months ago
See my parity test which demonstrates that the FlexAttention implementation is allclose equivalent to NATTEN and also to masked SDPA.
3x3 NATTEN kernel on 6x6 canvas
See my parity test which demonstrates that the FlexAttention implementation is allclose equivalent to NATTEN and also to masked SDPA.
3x3 NATTEN kernel on 6x6 canvas