dominance-related failure occurs when compiling an opt model

zjgarvey commented 3 months ago

What happened?

This error came from testing the /onnx/models/opt-125M-vaiq model in https://github.com/nod-ai/SHARK-TestSuite:

opt-125M-awq.default.onnx.torch.mlir:1218:13: error: operand #0 does not dominate this use
    %1166 = torch.aten.item %1165 : !torch.vtensor<[],si64> -> !torch.int
            ^
opt-125M-awq.default.onnx.torch.mlir:1218:13: note: see current operation: %412 = "tensor.extract"(%415#1) : (tensor<i32>) -> i32
opt-125M-awq.default.onnx.torch.mlir:1229:13: note: operand defined here (op in the same block)
    %1175 = torch.aten.where.self %1172, %1173, %1174 : !torch.vtensor<[],i1>, !torch.vtensor<[],si64>, !torch.vtensor<[],si64> -> !torch.vtensor<[],si64>
            ^

Steps to reproduce your issue

Download a smaller reproducer from this gist, then run

iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu  --iree-input-type=torch dom_error_repro.torch.mlir -o dump.vmfb

This will result in the error message:

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use
    %55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int
          ^
dom_error_repro.torch.mlir:62:11: note: see current operation: %17 = "tensor.extract"(%20#1) : (tensor<i32>) -> i32
dom_error_repro.torch.mlir:71:11: note: operand defined here (op in the same block)
    %64 = torch.aten.where.self %61, %62, %63 : !torch.vtensor<[],i1>, !torch.vtensor<[],si64>, !torch.vtensor<[],si64> -> !torch.vtensor<[],si64>

It seems to be failing on the pass iree-flow-form-scalar-dispatches. Here is the dump after failure:

// -----// IR Dump After FormScalarDispatchesPass Failed (iree-flow-form-scalar-dispatches) //----- //
"util.func"() <{function_type = (!hal.buffer_view, !hal.buffer_view, !hal.buffer_view, !hal.fence, !hal.fence) -> !hal.buffer_view, inlining_policy = #util.inline.never, sym_name = "main_graph$async"}> ({
^bb0(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view, %arg2: !hal.buffer_view, %arg3: !hal.fence, %arg4: !hal.fence):
  %0 = "arith.constant"() <{value = 0 : index}> : () -> index
  %1 = "arith.constant"() <{value = 1 : index}> : () -> index
  %2 = "arith.constant"() <{value = 0 : i32}> : () -> i32
  %3 = "arith.constant"() <{value = dense<[-1, 0]> : tensor<2xi32>}> : () -> tensor<2xi32>
  %4 = "hal.buffer_view.dim"(%arg0) {index = 0 : index} : (!hal.buffer_view) -> index
  %5 = "hal.buffer_view.dim"(%arg0) {index = 1 : index} : (!hal.buffer_view) -> index
  %6 = "hal.tensor.import"(%arg0, %4, %5, %arg3) {operandSegmentSizes = array<i32: 1, 2, 1>, target_encoding = tensor<?x?xi64>} : (!hal.buffer_view, index, index, !hal.fence) -> tensor<?x?xi32>
  %7 = "arith.index_cast"(%4) : (index) -> i32
  %8 = "arith.index_cast"(%5) : (index) -> i32
  %9 = "tensor.empty"() : () -> tensor<i32>
  %10 = "flow.dispatch.region"() <{operandSegmentSizes = array<i32: 0, 0>}> ({
    %34 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg7: i32):
      "linalg.yield"(%8) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    "flow.return"(%34) : (tensor<i32>) -> ()
  }, {
    %33 = "arith.constant"() <{value = 1 : index}> : () -> index
    "flow.return"(%33, %33, %33) : (index, index, index) -> ()
  }) : () -> tensor<i32>
  %11 = "tensor.expand_shape"(%10) <{reassociation = [], static_output_shape = array<i64: 1>}> : (tensor<i32>) -> tensor<1xi32>
  %12 = "tensor.insert_slice"(%10, %3) <{operandSegmentSizes = array<i32: 1, 1, 0, 0, 0>, static_offsets = array<i64: 1>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (tensor<i32>, tensor<2xi32>) -> tensor<2xi32>
  %13 = "tensor.extract_slice"(%12) <{operandSegmentSizes = array<i32: 1, 0, 0, 0>, static_offsets = array<i64: 0>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (tensor<2xi32>) -> tensor<i32>
  %14 = "tensor.expand_shape"(%13) <{reassociation = [], static_output_shape = array<i64: 1>}> : (tensor<i32>) -> tensor<1xi32>
  %15 = "tensor.extract"(%14, %0) : (tensor<1xi32>, index) -> i32
  %16 = "arith.cmpi"(%15, %2) <{predicate = 0 : i64}> : (i32, i32) -> i1
  %17 = "tensor.extract"(%20#1) : (tensor<i32>) -> i32
  %18 = "tensor.extract"(%11, %0) : (tensor<1xi32>, index) -> i32
  %19 = "arith.cmpi"(%18, %2) <{predicate = 0 : i64}> : (i32, i32) -> i1
  %20:2 = "flow.dispatch.region"() <{operandSegmentSizes = array<i32: 0, 0>}> ({
    %29 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg6: i32):
      %32 = "arith.select"(%16, %7, %15) : (i1, i32, i32) -> i32
      "linalg.yield"(%32) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    %30 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg5: i32):
      %31 = "arith.select"(%19, %8, %18) : (i1, i32, i32) -> i32
      "linalg.yield"(%31) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    "flow.return"(%30, %29) : (tensor<i32>, tensor<i32>) -> ()
  }, {
    %28 = "arith.constant"() <{value = 1 : index}> : () -> index
    "flow.return"(%28, %28, %28) : (index, index, index) -> ()
  }) : () -> (tensor<i32>, tensor<i32>)
  %21 = "tensor.extract"(%20#0) : (tensor<i32>) -> i32
  %22 = "tensor.from_elements"(%17, %21) : (i32, i32) -> tensor<2xi32>
  %23 = "tensor.reshape"(%6, %22) : (tensor<?x?xi32>, tensor<2xi32>) -> tensor<?x?xi32>
  %24 = "hal.tensor.barrier"(%23, %arg4) : (tensor<?x?xi32>, !hal.fence) -> tensor<?x?xi32>
  %25 = "tensor.dim"(%24, %0) : (tensor<?x?xi32>, index) -> index
  %26 = "tensor.dim"(%24, %1) : (tensor<?x?xi32>, index) -> index
  %27 = "hal.tensor.export"(%24, %25, %26) {source_encoding = tensor<?x?xi64>} : (tensor<?x?xi32>, index, index) -> !hal.buffer_view
  "util.return"(%27) : (!hal.buffer_view) -> ()
}) {iree.abi.model = "coarse-fences", iree.abi.stub} : () -> ()

What component(s) does this issue relate to?

Compiler

Version information

using a local build at commit 3b5d269c7fec61743cc41f4394b33a31625ef2ae

Additional context

No response

hanhanW commented 3 months ago

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use %55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int

It usually indicates that we don't set insertion point before creating an operation. @IanWood1 could help with this? I think you have some context about FormScalarDispatchesPass. I can jump in if you need some help.

cc @MaheshRavishankar

MaheshRavishankar commented 3 months ago

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use %55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int

It usually indicates that we don't set insertion point before creating an operation. @IanWood1 could help with this? I think you have some context about FormScalarDispatchesPass. I can jump in if you need some help.

cc @MaheshRavishankar

Oh crap... I havent touched that pass in ages.

IanWood1 commented 3 months ago

it seems like horizontal fusion is moving ops into the region that have uses before rootOp/the new region.

iree-org / iree