One or more Stablehlo test(s) crashing after llvm bump to 266a5a9cb9daa96c1eeaebc18e10f5a37d638734

aviator19941 commented 4 months ago

After bumping llvm-project to https://github.com/llvm/llvm-project/commit/266a5a9cb9daa96c1eeaebc18e10f5a37d638734, one or more Stablehlo test(s) crash and cause the CI to timeout here: https://github.com/llvm/torch-mlir/actions/runs/9982523928/job/27588414876?pr=3544.

Python/torchvision version: stable

After running the Stablehlo tests sequentially using python -m projects.pt1.e2e_testing.main -v --config=stablehlo -s, the last test to run is ReduceMaxAlongDimUnsignedInt:

====================
StableHLO Backend IR
module attributes {torch.debug_module_name = "ReduceMaxAlongDimUnsignedInt"} {
  func.func @forward(%arg0: tensor<?x?x?xui8>) -> (tensor<?x?xui8>, tensor<?x?xi64>) {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %c = stablehlo.constant dense<128> : tensor<ui8>
    %c_0 = stablehlo.constant dense<0> : tensor<i64>
    %dim = tensor.dim %arg0, %c0 : tensor<?x?x?xui8>
    %dim_1 = tensor.dim %arg0, %c1 : tensor<?x?x?xui8>
    %dim_2 = tensor.dim %arg0, %c2 : tensor<?x?x?xui8>
    %from_elements = tensor.from_elements %dim, %dim_1, %dim_2 : tensor<3xindex>
    %0 = stablehlo.dynamic_iota %from_elements, dim = 1 : (tensor<3xindex>) -> tensor<?x?x?xi64>
    %1:2 = stablehlo.reduce(%arg0 init: %c), (%0 init: %c_0) across dimensions = [1] : (tensor<?x?x?xui8>, tensor<?x?x?xi64>, tensor<ui8>, tensor<i64>) -> (tensor<?x?xui8>, tensor<?x?xi64>)
     reducer(%arg1: tensor<ui8>, %arg3: tensor<ui8>) (%arg2: tensor<i64>, %arg4: tensor<i64>)  {
      %2 = stablehlo.compare  GE, %arg1, %arg3,  SIGNED : (tensor<ui8>, tensor<ui8>) -> tensor<i1>
      %3 = stablehlo.select %2, %arg1, %arg3 : tensor<i1>, tensor<ui8>
      %4 = stablehlo.compare  EQ, %arg1, %arg3,  SIGNED : (tensor<ui8>, tensor<ui8>) -> tensor<i1>
      %5 = stablehlo.minimum %arg2, %arg4 : tensor<i64>
      %6 = stablehlo.select %2, %arg2, %arg4 : tensor<i1>, tensor<i64>
      %7 = stablehlo.select %4, %5, %6 : tensor<i1>, tensor<i64>
      stablehlo.return %3, %7 : tensor<ui8>, tensor<i64>
    }
    return %1#0, %1#1 : tensor<?x?xui8>, tensor<?x?xi64>
  }
}

and this error occurs after it:

python: /home/avsharma/torch-mlir/externals/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2868: llvm::LogicalResult legalizeUnresolvedMaterialization((anonymous namespace)::UnresolvedMaterializationRewrite &, DenseMap<mlir::Operation , (anonymous namespace)::UnresolvedMaterializationRewrite > &, mlir::ConversionPatternRewriter &, mlir::detail::ConversionPatternRewriterImpl &, DenseMap<mlir::Value, SmallVector> &): Assertion `newMaterialization.getType() == outputType && "materialization callback produced value of incorrect type"' failed. Aborted (core dumped)

aviator19941 commented 3 months ago

ReduceMaxAlongDimUnsignedInt test fails in this PR: https://github.com/llvm/torch-mlir/pull/3544

aviator19941 commented 3 months ago

@vivekkhandelwal1 or @renxida do you have cycles to help with this? Got pulled into llama2 work. FYI: I'm bumping to https://github.com/llvm/llvm-project/commit/168ecd706904d6ce221dc5107da92c56aea7c8e9 today (Merged here: https://github.com/iree-org/iree/pull/17978)

vivekkhandelwal1 commented 3 months ago

@vivekkhandelwal1 or @renxida do you have cycles to help with this? Got pulled into llama2 work. FYI: I'm bumping to llvm/llvm-project@168ecd7 today (Merged here: iree-org/iree#17978)

Hi @aviator19941, I can take a look at this on Monday.

vivekkhandelwal1 commented 3 months ago

@vivekkhandelwal1 or @renxida do you have cycles to help with this? Got pulled into llama2 work. FYI: I'm bumping to llvm/llvm-project@168ecd7 today (Merged here: iree-org/iree#17978)

Hi @aviator19941, I can take a look at this on Monday.

Hi @aviator19941, do I still need to take a look at this?

llvm / torch-mlir

One or more Stablehlo test(s) crashing after llvm bump to 266a5a9cb9daa96c1eeaebc18e10f5a37d638734 #3549