Closed MaheshRavishankar closed 4 days ago
@stellaraccident could you please take a look at the changes to the options here. I have a suspicion that this is too invasive a change. I want to prevent making externally visible changes to IREEs APIs, but maybe these changes do. I can try to make something a bit less invasive.
IMO this isn't the kind of thing we want a global user-level compiler option for - are we positive that's needed here? Options at that level are us saying "we have no way of ever doing this well or with any finesse and must make the user decide uniformly across their entire program" - this doesn't feel like something that is in that category to me.
(not saying this can't be an option or anything today, just that a global compiler option is a better fit for transient hacks/workarounds/etc things than surfacing it at the compiler driver API level - we don't want that set to grow to lots of little optimization details and instead be more general flags that change pipeline strategies)
That's probably my fault for making it a user level option initially (if this is what I think it is). That said, it's surprisingly difficult to know when to do the transposition to outer dims. I would rather we default to never "fixing up" concats for people like this and instead just always fuse the insert_slice
ops that come out of decomposition into producers.
I would like to know if otherwise, but transposing concats feels like a case where we're trying to make the compiler do too much. Unlike all of the logic we have in dispatch formation for handling reshapes and transposes etc. (which is cleaning up the poor ability for frameworks to express einsums), the frameworks should be sufficiently expressive about what dimension to concatenate data along.
I dropped this PR in favor of https://github.com/iree-org/iree/pull/19177 . I think using the option to modify concats to be outer dims has to happen before transpose propagation and I realized I dont need to move it for now.
I agree though, overall we should never try to make all the concat outer dims in the compiler. That is really tricky (this + transpose propagation can some times give you a good outcome, but it is not very stable IMO). If the user input has bad concats, well there is only so much the compiler can do automatically.
Move the pass to decompose concats just before dispatch region formation, and only decompose concats that are not along the outer-most dimension. This could allow folding these insert-slices into their producers.
Outer dim concats are not decomposed here. They are left till flow conversion. Decomposing those into
flow.tensor.update
allows stream allocation to do the concat in-place.Fixes #19092
Depends on #19126 and https://github.com/llvm/llvm-project/pull/116004