Open pdhirajkumarprasad opened 1 month ago
The problematic part is
%102 = linalg.generic {indexing_maps = [#map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} outs(%101 : tensor<?x12x?x?xi1>) {
^bb0(%out: i1):
%108 = linalg.index 0 : index
%109 = linalg.index 3 : index
%110 = arith.index_cast %80 : i64 to index
%111 = arith.cmpi eq, %110, %c1 : index
%112 = arith.select %111, %c0, %108 : index
%113 = arith.index_cast %83 : i64 to index
%114 = arith.cmpi eq, %113, %c1 : index
%115 = arith.select %114, %c0, %109 : index
%extracted_26 = tensor.extract %reshape_21[%112, %c0, %c0, %115] : tensor<?x1x1x?xi1>
linalg.yield %extracted_26 : i1
} -> tensor<?x12x?x?xi1>
This looks like a broadcast to me. We can directly pass %reshape_21
op to the generic rather than accessing it from outside.
Could someone look at the onnx.Expand
op lowering?
complete list of models failing due to this
I think I've figure out a problematic component of the Expand lowering. Going to make sure this is an appropriate fix and will post updates.
So removing some logic to take the max between the provided dim size and the input dim for broadcasting seems to unblock this issue, however, this logic is necessary for producing correct results in other cases.
onnx.Expand allows having provided shapes less than the input shape at a given dim (in which case it doesn't broadcast there, and would cause an issue if we didn't take the max).
Going to keep digging a bit.
I got it to work with some extra shape help in torch-mlir.
Will update soon.
Small reproducers are passing, but full models still fail on a further node.
If this can be fixed in the front-end then thats great but I think we should be able to support this in the compiler, in that regard I think the problem starts in the iree-codegen-tile-and-distribute-to-workgroups
. I am sanity checking the input we have at this point, the output we get is not something that can be legalized. Both are provided in this gist
Here is the command I used
iree-opt tile_and_distribute_repro.mlir \
-pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-tile-and-distribute-to-workgroups, canonicalize)), cse)))' \
&> ouput.mlir
CC @MaheshRavishankar to take a look at the IR as well.
https://github.com/llvm/torch-mlir/pull/3756 addresses the compile failure by simplifying the IR getting generated for broadcast substantially, but slightly reduces some rare case coverage for onnx.Expand.
If you guys think that the old IR should just be supported in IREE anyway, then it might not be worth landing that PR.
With this PR to clean up some of the gross shape computations that aren't simplifying at the torch level, https://github.com/llvm/torch-mlir/pull/3757, I was able to compile the failing models when returning on the first "Where" node, but interestingly was then failing to compile again when returning on the next, nearly identical "Where" node.
I think I've looked into ways to try and simplify the broadcast shapes as best as possible with the two PR's I've posted so far. I'm not sure what else we could do from the front-end.
@zjgarvey you are probably already looking at it, but this kind of IR is really strange
%dim = tensor.dim %arg4, %c0 : tensor<?x?x768xf32>
%1 = arith.index_cast %dim : index to i64
%2 = tensor.empty() : tensor<i1>
%3 = linalg.fill ins(%0 : i1) outs(%2 : tensor<i1>) -> tensor<i1>
%4 = tensor.empty() : tensor<i64>
%5 = linalg.fill ins(%1 : i64) outs(%4 : tensor<i64>) -> tensor<i64>
%6 = linalg.fill ins(%extracted : i64) outs(%4 : tensor<i64>) -> tensor<i64>
%7 = linalg.generic {indexing_maps = [#map, #map, #map, #map], iterator_types = []} ins(%3, %5, %6 : tensor<i1>, tensor<i64>, tensor<i64>) outs(%4 : tensor<i64>) {
^bb0(%in: i1, %in_26: i64, %in_27: i64, %out: i64):
%108 = arith.select %in, %in_26, %in_27 : i64
linalg.yield %108 : i64
} -> tensor<i64>
%extracted_3 = tensor.extract %7[] : tensor<i64>
This is taking a dim of a tensor, creating a new tensor and inserting into it, then performing a linalg.generic operation to do a select
? and then extracting it out..
Its basically shape computation that is artificially put into tensor math... IREE tries to do all tensor math on the device. So all of this computation, which is really shape computation that should be done on the host, is being transfered into device and then things go haywire cause then it is artifically looking like an indirect dispatch problem where the shape computation is dependent on previous computation on the device. The easiest fix is that the front end needs to not try to do shape computation as tensor math.
We need to generate new linalg IR for this issue with https://github.com/llvm/torch-mlir/pull/3762 , since the ONXX IR has been already working when run directly from IREE
This is also resolved with the change https://github.com/llvm/torch-mlir/pull/3756 so it might not reproduce an issue in IREE.
What happened?
For the give IR
getting error as
this linalg IR was generated with following ONNX IR
If I pass the ONNX IR directory, iree is compiling fine but when passign linalg IR, after lowing it through torch-mlir, it's failing with above error
Steps to reproduce your issue
command:
What component(s) does this issue relate to?
Compiler
Version information
No response
Additional context
No response