Closed mbrookhart closed 16 hours ago
This also removes asserts in lowering for higher dimensional kernels. As far as I can tell, those restrictions aren't required in practice.
Please retain these asserts for now. There are some known issues with 3d convert layout.
For layout propagation itself, I'll defer it to @ThomasRaoux
This also removes asserts in lowering for higher dimensional kernels. As far as I can tell, those restrictions aren't required in practice.
Please retain these asserts for now. There are some known issues with 3d convert layout.
For layout propagation itself, I'll defer it to @ThomasRaoux
Good to know. @mbrookhart can you separate it out for now? Someone can help figure out the problems
I put the asserts back in and added the requested checks to the mlir. Thanks @ThomasRaoux @Jokeren !
While working on some higher dimension tensor kernels, I noticed poor performance due to the fact that layouts wouldn't propagate to local loads. Since we do allow layout folding with local store and local alloc, this seems like a bit of an oversight.
The change gives a 40% speed improvement on certain kernels for NVidia GPUs.
This also removes asserts in lowering for higher dimensional kernels. As far as I can tell, those restrictions aren't required in practice.
New contributor declaration
pre-commit run --from-ref origin/main --to-ref HEAD
.lit
tests I have added follow these best practices