pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention
BSD 3-Clause "New" or "Revised" License
484 stars 24 forks source link

NATTEN example #16

Closed Birch-san closed 3 months ago

Birch-san commented 3 months ago

See my parity test which demonstrates that the FlexAttention implementation is allclose equivalent to NATTEN and also to masked SDPA.

3x3 NATTEN kernel on 6x6 canvas
natten_c6x6_k3x3