triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
13.47k stars 1.66k forks source link

Allow Layouts to propogate to local_load #5219

Closed mbrookhart closed 16 hours ago

mbrookhart commented 19 hours ago

While working on some higher dimension tensor kernels, I noticed poor performance due to the fact that layouts wouldn't propagate to local loads. Since we do allow layout folding with local store and local alloc, this seems like a bit of an oversight.

The change gives a 40% speed improvement on certain kernels for NVidia GPUs.

This also removes asserts in lowering for higher dimensional kernels. As far as I can tell, those restrictions aren't required in practice.

New contributor declaration

Jokeren commented 19 hours ago

This also removes asserts in lowering for higher dimensional kernels. As far as I can tell, those restrictions aren't required in practice.

Please retain these asserts for now. There are some known issues with 3d convert layout.

For layout propagation itself, I'll defer it to @ThomasRaoux

ThomasRaoux commented 17 hours ago

This also removes asserts in lowering for higher dimensional kernels. As far as I can tell, those restrictions aren't required in practice.

Please retain these asserts for now. There are some known issues with 3d convert layout.

For layout propagation itself, I'll defer it to @ThomasRaoux

Good to know. @mbrookhart can you separate it out for now? Someone can help figure out the problems

mbrookhart commented 17 hours ago

I put the asserts back in and added the requested checks to the mlir. Thanks @ThomasRaoux @Jokeren !