iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.56k stars 568 forks source link

Simplify the application and maintanence of model specific optimization patterns #16893

Open MaheshRavishankar opened 5 months ago

MaheshRavishankar commented 5 months ago

As part of adding support https://github.com/openxla/iree/pull/16854/ it might be useful to have a way to easily inject model specific optimizations that could be useful to have.

Main findings. 1) There are some cases where certain one-off fusions are useful to do as pattern based rewrites. One such example is the "horizontal" fusion used to combine multiple GEMMs into a single GEMM. These might be legitimate patterns to have run always (but they do introduce artifacts that might affect how things get fused. The patterns are also slightly different based on the data types. 2) This can be a C++ pass that is invoked as a preprocessing. This support exists already, but to use this the pass needs to be "built" with IREE, so the pass needs to be in tree or in a "deployment" fork.

Would be useful to have a way to inject such patterns without having them built with the compiler

Covered Commits

Immediate next steps

PDLL-based pattern rewrites seem like a good fit to allow custom program rewrites (with potential helper functions exposed in IREE).

1) There is already a way to use a transform dialect script to apply transformations to an input program. The transform dialect script itself could live out-of-tree and is just loaded during compilation and applied. A similar setup could be done to apply PDLL patterns read from a file during compilation. There are a couple of options here.

2) Some of the existing passes (like RaiseSpecialOps) could be simplified/be made more maintainable by using PDLL patterns in tree.

There is one caveat. The PDLL-based patterns dont apply to linalg.generics. They can apply to Linalg named ops or to torch dialect ops (or similar dialects). Essentially there is a limitation on matching ops with regions (though I need to understand more to be sure).

kuhar commented 5 months ago

I also looked into preprocessing with transform scripts and found it very slow: https://github.com/openxla/iree/issues/16901