iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.82k stars 608 forks source link

[compile] 'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect #18412

Closed pdhirajkumarprasad closed 1 month ago

pdhirajkumarprasad commented 2 months ago

What happened?

for given IR

module {
  func.func @"torch-jit-export"(%arg0: !torch.vtensor<[?,4],f32>) -> !torch.vtensor<[?,1],si64>  attributes {torch.onnx_meta.ir_version = 6 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.7"} {
    %82 = torch.operator "onnx.Multinomial"(%arg0) {torch.onnx.dtype = 7 : si64, torch.onnx.sample_size = 1 : si64} : (!torch.vtensor<[?,4],f32>) -> !torch.vtensor<[?,1],si64> 
    return %82 : !torch.vtensor<[?,1],si64>
  }
}

getting error as

model.torch_onnx.mlir:3:11: error: 'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect
    %82 = torch.operator "onnx.Multinomial"(%arg0) {torch.onnx.dtype = 7 : si64, torch.onnx.sample_size = 1 : si64} : (!torch.vtensor<[?,4],f32>) -> !torch.vtensor<[?,1],si64> 
          ^

This may be related with https://github.com/llvm/torch-mlir/issues/3651 but IR given in example is working fine so filing this.

log with '--mlir-print-ir-after-all --mlir-print-ir-before-all --mlir-disable-threading --mlir-elide-elementsattrs-if-larger=4 model.torch_onnx.mlir' attached dump.log

Steps to reproduce your issue

command to reproduce:

iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu temp.mlir

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

nirvedhmeshram commented 1 month ago

I think I follow the problem but need @MaheshRavishankar 's input on what to do about it, I see that normally the tensor.dim due to dynamic shapes at the end of the program gets folded in the FormDispatchRegionsPass . I havent looked at the code but I am assuming it traverses the size of the problem and can infer that something like %0 = hal.buffer_view.dim<%arg0 : !hal.buffer_view>[0] : index is the SSA value it should use instead.

Now coming to the problem, for this op we have something like this

  ...
  %0 = hal.buffer_view.dim<%arg0 : !hal.buffer_view>[0] : index
  ...
  %2 = arith.index_cast %0 : index to i32
  ...
  %7 = scf.for %arg3 = %c0_i32 to %2 step %c1_i32 iter_args(%arg4 = %3) -> (tensor<?x1xi32>)  : i32 {
  ...
  }
  %8 = hal.tensor.barrier join(%7 : tensor<?x1xi32>) => %arg2 : !hal.fence
  %dim = tensor.dim %8, %c0 : tensor<?x1xi32>
  %9 = hal.tensor.export %8 : tensor<?x1xi64> as tensor<?x1xi32>{%dim} -> !hal.buffer_view
  util.return %9 : !hal.buffer_view
  }

so the shape inference doesnt work for something like this. here is a full dump

MaheshRavishankar commented 1 month ago

THis is related to #18268 and as discussed there, lowering the multinomial op that way is not going to work. So any bugs related to that is just going to keep hitting different bugs in the compiler due to the unsupported lowering of the op. I dont see a point triaging this further until the lowering is fixed.

MaheshRavishankar commented 1 month ago

I am dropping this from the project.

pdhirajkumarprasad commented 1 month ago

this issue is similar to https://github.com/iree-org/iree/issues/18268 so closing this one