Closed adam-smnk closed 7 months ago
2~4% regression on some MHA benchmarks due to more fine-grained tiling after relaxing heuristic. No impact on simple gemm and mlp benchmarks.
The main motivation for the change is improving GPU outlining where tile-and-fuse will be used to map workloads into block tiles and thread subtiles. Long term we might need to have more flexible way to manage tiling heuristic.
The main motivation for the change is improving GPU outlining where tile-and-fuse will be used to map workloads into block tiles and thread subtiles. Long term we might need to have more flexible way to manage tiling heuristic.
You mean, like this? 😄 https://discourse.llvm.org/t/rfc-target-description-and-cost-model-in-mlir/76990
The main motivation for the change is improving GPU outlining where tile-and-fuse will be used to map workloads into block tiles and thread subtiles. Long term we might need to have more flexible way to manage tiling heuristic.
You mean, like this? 😄 https://discourse.llvm.org/t/rfc-target-description-and-cost-model-in-mlir/76990
This will definitely help 🔥 In tile and fuse there are internally quite a few decisions made, some of that could be more exposed perhaps through optional user callbacks.
For now the performance penalty is small and I think it is worth greater flexibility.
This will definitely help 🔥 In tile and fuse there are internally quite a few decisions made, some of that could be more exposed perhaps through optional user callbacks.
@nhasabni is working on that right now, we'll try to push that upstream, but maybe test in tpp-mlir first, not sure yet.
Reverted changes to the default tiling validation. Instead, added it as an option with the default value corresponding to the previous behavior.
TODO: add tiling tests for the options and test tiling for GPU.
Added missing test. Default tiling behavior is unchanged, thus, no more regressions in benchmarks.
Adds a new option to tile-and-fuse pass to expose control over dimension to tile size factor (ratio) when selecting eligible operations for tiling. The chosen factor requirement must be fulfilled for all dimensions.
This change allows tiling when dimensions are equal to their corresponding tile size or restricts tiling to larger workloads. For example, tile factor of 1 creates more opportunities in kernel outlining for wide and tall workloads e.g., memref<128x1024> into 128x128 tiles.