iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 620 forks source link

[DispatchCreation] Move concat decomposition to Dispatch creation and only decompose non-outer dim concats #19148

Closed MaheshRavishankar closed 4 days ago

MaheshRavishankar commented 1 week ago

Move the pass to decompose concats just before dispatch region formation, and only decompose concats that are not along the outer-most dimension. This could allow folding these insert-slices into their producers.

Outer dim concats are not decomposed here. They are left till flow conversion. Decomposing those into flow.tensor.update allows stream allocation to do the concat in-place.

Fixes #19092

Depends on #19126 and https://github.com/llvm/llvm-project/pull/116004

MaheshRavishankar commented 1 week ago

@stellaraccident could you please take a look at the changes to the options here. I have a suspicion that this is too invasive a change. I want to prevent making externally visible changes to IREEs APIs, but maybe these changes do. I can try to make something a bit less invasive.

benvanik commented 1 week ago

IMO this isn't the kind of thing we want a global user-level compiler option for - are we positive that's needed here? Options at that level are us saying "we have no way of ever doing this well or with any finesse and must make the user decide uniformly across their entire program" - this doesn't feel like something that is in that category to me.

(not saying this can't be an option or anything today, just that a global compiler option is a better fit for transient hacks/workarounds/etc things than surfacing it at the compiler driver API level - we don't want that set to grow to lots of little optimization details and instead be more general flags that change pipeline strategies)

qedawkins commented 1 week ago

That's probably my fault for making it a user level option initially (if this is what I think it is). That said, it's surprisingly difficult to know when to do the transposition to outer dims. I would rather we default to never "fixing up" concats for people like this and instead just always fuse the insert_slice ops that come out of decomposition into producers.

I would like to know if otherwise, but transposing concats feels like a case where we're trying to make the compiler do too much. Unlike all of the logic we have in dispatch formation for handling reshapes and transposes etc. (which is cleaning up the poor ability for frameworks to express einsums), the frameworks should be sufficiently expressive about what dimension to concatenate data along.

MaheshRavishankar commented 4 days ago

I dropped this PR in favor of https://github.com/iree-org/iree/pull/19177 . I think using the option to modify concats to be outer dims has to happen before transpose propagation and I realized I dont need to move it for now.

I agree though, overall we should never try to make all the concat outer dims in the compiler. That is really tricky (this + transpose propagation can some times give you a good outcome, but it is not very stable IMO). If the user input has bad concats, well there is only so much the compiler can do automatically.