Mostly a mechanical change to expand the experimental support for TMA, which is currently limited to 1-2D. I have a use case for TMA to load 3D or 4D tensor which encodes blocked scales from MXFP in a specialized layout.

Swizzling is disabled for higher rank TMA, to set hasLeadingOffset = false for the dst SMEM allocated in TMA lowering. The new unittest fails if swizzling is enabled for TMA and hasLeadingOffset = true. I believe this is simply due to implementation limitations, so I hope we can enable swizziling for higher rank TMA in the future.

cc @ThomasRaoux @mbrookhart @csullivan

New contributor declaration

[x] I am not making a trivial change, such as fixing a typo in a comment.
[x] I have written a PR description following these rules.
[x] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- [x] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
Select one of the following.
- [x] I have not added any lit tests.
- [ ] The lit tests I have added follow these best practices, including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)

triton-lang / triton

Add support for 3-5D TMA to allow loading non-matmul operands #5207

New contributor declaration