Simplify the application and maintanence of model specific optimization patterns

As part of adding support https://github.com/openxla/iree/pull/16854/ it might be useful to have a way to easily inject model specific optimizations that could be useful to have.

Main findings. 1) There are some cases where certain one-off fusions are useful to do as pattern based rewrites. One such example is the "horizontal" fusion used to combine multiple GEMMs into a single GEMM. These might be legitimate patterns to have run always (but they do introduce artifacts that might affect how things get fused. The patterns are also slightly different based on the data types. 2) This can be a C++ pass that is invoked as a preprocessing. This support exists already, but to use this the pass needs to be "built" with IREE, so the pass needs to be in tree or in a "deployment" fork.

Would be useful to have a way to inject such patterns without having them built with the compiler

Covered Commits

Immediate next steps

PDLL-based pattern rewrites seem like a good fit to allow custom program rewrites (with potential helper functions exposed in IREE).

1) There is already a way to use a transform dialect script to apply transformations to an input program. The transform dialect script itself could live out-of-tree and is just loaded during compilation and applied. A similar setup could be done to apply PDLL patterns read from a file during compilation. There are a couple of options here.

Similar to the hookup to apply transform dialect scripts, a flag could be added to apply patterns read from a PDLL file from out-of-tree.
Transform dialect itself has a way to inject PDLL patterns. I havent looked into it, but this could piggy back on using the existing TD hook up.

2) Some of the existing passes (like RaiseSpecialOps) could be simplified/be made more maintainable by using PDLL patterns in tree.

There is one caveat. The PDLL-based patterns dont apply to linalg.generics. They can apply to Linalg named ops or to torch dialect ops (or similar dialects). Essentially there is a limitation on matching ops with regions (though I need to understand more to be sure).

iree-org / iree

Simplify the application and maintanence of model specific optimization patterns #16893