Open nirvedhmeshram opened 1 week ago
@MaheshRavishankar just an FYI, @Max191 and I spent time looking at IR today that ultimately turned out to be this issue. I think we can always be cautious and place the affine math towards the start of the loop but this seems like an upstream bug to at least be aware of.
This is very strange. The placement of operations shouldn't matter. Definitely seems like an upstream bug
I think I might know what is going wrong actually. In the second case, no insertion point can be found to create an extract_slice of %arg2
. It needs indices for the slice, but the indices are defined below the loop. Since the indices are not in scope before the loop (where the tensor.empty() needs to be replaced), it cannot create the extract_slice, and so it fails.
@nirvedhmeshram I think the solution is to set the insertion point before the loop when computing the delinearized offsets for the parallel_insert_slice op in the collapse_shape propagation pattern.
In this working example we have
This empty is correctly eliminated to
However in this very similar case with the only difference being the affine calculations are done after the nested loop
The empty survives the pass
Note that there are no other uses of the affine math except the parallel_insert_slice