iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.86k stars 625 forks source link

[CPU][ONNX] Onnx test failures after pulling in torch-mlir changes #18961

Open Max191 opened 4 weeks ago

Max191 commented 4 weeks ago

There were 3 new test failures after pulling in a torch-mlir patch: https://github.com/llvm/torch-mlir/commit/55ff110dc29cab7e2495ccdbec9a60512c29c665

The following tests failed:

// RUN: iree-compile /tmp/test.mlir --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=generic --iree-input-demote-f64-to-f32=false --mlir-disable-threading --mlir-print-ir-after-all -o /tmp/out.vmfb &> /tmp/dump.mlir

module {
  func.func @test_tfidfvectorizer_tf_batch_onlybigrams_skip0(%arg0: !torch.vtensor<[2,6],si32>) -> !torch.vtensor<[2,7],f32> attributes {torch.onnx_meta.ir_version = 4 : si64, torch.onnx_meta.opset_version = 17 : si64, torch.onnx_meta.producer_name = "backend-test", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.TfIdfVectorizer"(%arg0) {torch.onnx.max_gram_length = 2 : si64, torch.onnx.max_skip_count = 0 : si64, torch.onnx.min_gram_length = 2 : si64, torch.onnx.mode = "TF", torch.onnx.ngram_counts = [0 : si64, 4 : si64], torch.onnx.ngram_indexes = [0 : si64, 1 : si64, 2 : si64, 3 : si64, 4 : si64, 5 : si64, 6 : si64], torch.onnx.pool_int64s = [2 : si64, 3 : si64, 5 : si64, 4 : si64, 5 : si64, 6 : si64, 7 : si64, 8 : si64, 6 : si64, 7 : si64]} : (!torch.vtensor<[2,6],si32>) -> !torch.vtensor<[2,7],f32> 
    return %0 : !torch.vtensor<[2,7],f32>
  }
}

// -----

// RUN: iree-compile /tmp/test.mlir --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=generic --iree-input-demote-f64-to-f32=false --mlir-disable-threading --mlir-print-ir-after-all -o /tmp/out.vmfb &> /tmp/dump.mlir

module {
  func.func @test_tfidfvectorizer_tf_batch_onlybigrams_skip5(%arg0: !torch.vtensor<[2,6],si32>) -> !torch.vtensor<[2,7],f32> attributes {torch.onnx_meta.ir_version = 4 : si64, torch.onnx_meta.opset_version = 17 : si64, torch.onnx_meta.producer_name = "backend-test", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.TfIdfVectorizer"(%arg0) {torch.onnx.max_gram_length = 2 : si64, torch.onnx.max_skip_count = 5 : si64, torch.onnx.min_gram_length = 2 : si64, torch.onnx.mode = "TF", torch.onnx.ngram_counts = [0 : si64, 4 : si64], torch.onnx.ngram_indexes = [0 : si64, 1 : si64, 2 : si64, 3 : si64, 4 : si64, 5 : si64, 6 : si64], torch.onnx.pool_int64s = [2 : si64, 3 : si64, 5 : si64, 4 : si64, 5 : si64, 6 : si64, 7 : si64, 8 : si64, 6 : si64, 7 : si64]} : (!torch.vtensor<[2,6],si32>) -> !torch.vtensor<[2,7],f32> 
    return %0 : !torch.vtensor<[2,7],f32>
  }
}

// -----

// RUN: iree-compile /tmp/test.mlir --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=generic --iree-input-demote-f64-to-f32=false --mlir-disable-threading --mlir-print-ir-after-all -o /tmp/out.vmfb &> /tmp/dump.mlir

module {
  func.func @test_tfidfvectorizer_tf_batch_uniandbigrams_skip5(%arg0: !torch.vtensor<[2,6],si32>) -> !torch.vtensor<[2,7],f32> attributes {torch.onnx_meta.ir_version = 4 : si64, torch.onnx_meta.opset_version = 17 : si64, torch.onnx_meta.producer_name = "backend-test", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.TfIdfVectorizer"(%arg0) {torch.onnx.max_gram_length = 2 : si64, torch.onnx.max_skip_count = 5 : si64, torch.onnx.min_gram_length = 1 : si64, torch.onnx.mode = "TF", torch.onnx.ngram_counts = [0 : si64, 4 : si64], torch.onnx.ngram_indexes = [0 : si64, 1 : si64, 2 : si64, 3 : si64, 4 : si64, 5 : si64, 6 : si64], torch.onnx.pool_int64s = [2 : si64, 3 : si64, 5 : si64, 4 : si64, 5 : si64, 6 : si64, 7 : si64, 8 : si64, 6 : si64, 7 : si64]} : (!torch.vtensor<[2,6],si32>) -> !torch.vtensor<[2,7],f32> 
    return %0 : !torch.vtensor<[2,7],f32>
  }
}

The patch is reverted in IREE for now, so to reproduce the failures, use this branch that has the patch reapplied: https://github.com/Max191/iree/tree/onnx-cpu-unrolling-fail

zjgarvey commented 3 weeks ago

For context, I actually added this patch as a means to resolve an issue when unrolling loops for other tests see https://github.com/iree-org/iree/pull/18867#discussion_r1811101655.

In my opinion, it seems bad to indiscriminately unroll loops in the IR, so something actually needs to be addressed in the test examples. Why do the batched examples of TFIDF fail, but the unbatched ones pass?

It seems like IREE fails to handle arith.sitofp : i64 -> f64 when the input is the result of an scf.for loop, but not when those loops are unrolled?

For operations like this, I don't expect the result to be performant, but I just don't know what the constraints are from the IREE side. Do we straight up disallow any scalar scf loops?

zjgarvey commented 3 weeks ago

Maybe this comment from Ben is related: https://github.com/iree-org/iree/issues/18268#issuecomment-2305306785