llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Other
1.35k stars 507 forks source link

One or more Stablehlo test(s) crashing after llvm bump to 266a5a9cb9daa96c1eeaebc18e10f5a37d638734 #3549

Open aviator19941 opened 4 months ago

aviator19941 commented 4 months ago

After bumping llvm-project to https://github.com/llvm/llvm-project/commit/266a5a9cb9daa96c1eeaebc18e10f5a37d638734, one or more Stablehlo test(s) crash and cause the CI to timeout here: https://github.com/llvm/torch-mlir/actions/runs/9982523928/job/27588414876?pr=3544.

Python/torchvision version: stable

After running the Stablehlo tests sequentially using python -m projects.pt1.e2e_testing.main -v --config=stablehlo -s, the last test to run is ReduceMaxAlongDimUnsignedInt:

====================
StableHLO Backend IR
module attributes {torch.debug_module_name = "ReduceMaxAlongDimUnsignedInt"} {
  func.func @forward(%arg0: tensor<?x?x?xui8>) -> (tensor<?x?xui8>, tensor<?x?xi64>) {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %c = stablehlo.constant dense<128> : tensor<ui8>
    %c_0 = stablehlo.constant dense<0> : tensor<i64>
    %dim = tensor.dim %arg0, %c0 : tensor<?x?x?xui8>
    %dim_1 = tensor.dim %arg0, %c1 : tensor<?x?x?xui8>
    %dim_2 = tensor.dim %arg0, %c2 : tensor<?x?x?xui8>
    %from_elements = tensor.from_elements %dim, %dim_1, %dim_2 : tensor<3xindex>
    %0 = stablehlo.dynamic_iota %from_elements, dim = 1 : (tensor<3xindex>) -> tensor<?x?x?xi64>
    %1:2 = stablehlo.reduce(%arg0 init: %c), (%0 init: %c_0) across dimensions = [1] : (tensor<?x?x?xui8>, tensor<?x?x?xi64>, tensor<ui8>, tensor<i64>) -> (tensor<?x?xui8>, tensor<?x?xi64>)
     reducer(%arg1: tensor<ui8>, %arg3: tensor<ui8>) (%arg2: tensor<i64>, %arg4: tensor<i64>)  {
      %2 = stablehlo.compare  GE, %arg1, %arg3,  SIGNED : (tensor<ui8>, tensor<ui8>) -> tensor<i1>
      %3 = stablehlo.select %2, %arg1, %arg3 : tensor<i1>, tensor<ui8>
      %4 = stablehlo.compare  EQ, %arg1, %arg3,  SIGNED : (tensor<ui8>, tensor<ui8>) -> tensor<i1>
      %5 = stablehlo.minimum %arg2, %arg4 : tensor<i64>
      %6 = stablehlo.select %2, %arg2, %arg4 : tensor<i1>, tensor<i64>
      %7 = stablehlo.select %4, %5, %6 : tensor<i1>, tensor<i64>
      stablehlo.return %3, %7 : tensor<ui8>, tensor<i64>
    }
    return %1#0, %1#1 : tensor<?x?xui8>, tensor<?x?xi64>
  }
}

and this error occurs after it:

python: /home/avsharma/torch-mlir/externals/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2868: llvm::LogicalResult legalizeUnresolvedMaterialization((anonymous namespace)::UnresolvedMaterializationRewrite &, DenseMap<mlir::Operation , (anonymous namespace)::UnresolvedMaterializationRewrite > &, mlir::ConversionPatternRewriter &, mlir::detail::ConversionPatternRewriterImpl &, DenseMap<mlir::Value, SmallVector> &): Assertion `newMaterialization.getType() == outputType && "materialization callback produced value of incorrect type"' failed. Aborted (core dumped)

aviator19941 commented 3 months ago

ReduceMaxAlongDimUnsignedInt test fails in this PR: https://github.com/llvm/torch-mlir/pull/3544

aviator19941 commented 3 months ago

@vivekkhandelwal1 or @renxida do you have cycles to help with this? Got pulled into llama2 work. FYI: I'm bumping to https://github.com/llvm/llvm-project/commit/168ecd706904d6ce221dc5107da92c56aea7c8e9 today (Merged here: https://github.com/iree-org/iree/pull/17978)

vivekkhandelwal1 commented 3 months ago

@vivekkhandelwal1 or @renxida do you have cycles to help with this? Got pulled into llama2 work. FYI: I'm bumping to llvm/llvm-project@168ecd7 today (Merged here: iree-org/iree#17978)

Hi @aviator19941, I can take a look at this on Monday.

vivekkhandelwal1 commented 3 months ago

@vivekkhandelwal1 or @renxida do you have cycles to help with this? Got pulled into llama2 work. FYI: I'm bumping to llvm/llvm-project@168ecd7 today (Merged here: iree-org/iree#17978)

Hi @aviator19941, I can take a look at this on Monday.

Hi @aviator19941, do I still need to take a look at this?