triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
13.5k stars 1.67k forks source link

Add support for 3-5D TMA to allow loading non-matmul operands #5207

Open masahi opened 4 days ago

masahi commented 4 days ago

Mostly a mechanical change to expand the experimental support for TMA, which is currently limited to 1-2D. I have a use case for TMA to load 3D or 4D tensor which encodes blocked scales from MXFP in a specialized layout.

Swizzling is disabled for higher rank TMA, to set hasLeadingOffset = false for the dst SMEM allocated in TMA lowering. The new unittest fails if swizzling is enabled for TMA and hasLeadingOffset = true. I believe this is simply due to implementation limitations, so I hope we can enable swizziling for higher rank TMA in the future.

cc @ThomasRaoux @mbrookhart @csullivan

New contributor declaration