renxida commented 4 months ago

Problem: We don't test onnx ops in our CI

sometimes bad onnx lowering slip past code review
- SHARK-TestSuite catches some of them, but it's easy to improperly write test cases that don't actually run
- onnx lowerings regress and get missed because SHARK-TestSuite isn't run as a part of the CI
- While IREE, a downstream project, does test for onnx nodes numerically, sometimes it's hard to tell whether an onnx failure is caused by something in IREE or torch-mlir

For example, we have these Onnx Ops that have made their way into torch-mlir but ultimately don't run in IREE's test suites:

LSTM (i wrote this and thought this worked based on SHARK-TestSuite!)
STFT
HardMax (and many, many more!)

If we have some onnx node tests in torch-mlir CI:

if an op works, we know that it's a downstream problem with IREE
if an op doesn't work, we know exactly why because the error messages and the failures will be right there inside the CI
if an op regresses, we know exactly who & what is responsibl

Problems with existing solutions

our existing test-suite

We have an existing test-suite in projects/pt1 that imports a lot of pytorch ops, and performs numerical comparison with native pytorch via a variety of paths, including ONNX.

There are two main problems with these:

some Onnx ops don't have pytorch analogues and cannot be tested here
with ops that represent layers & carry weights, the existing testing infrastructure generates weights separately between pytorch and torch-mlir, causing the test cases to always fail numerically.

testing downstream in IREE

Our downstream project IREE does run good torch node tests, but reports many of the onnx ops that we've lowered as failing. I haven't found a way to view the error messages, and it's also hard to tell whether these failures are due to IREE or torch-mlir.

Proposed solution:

We should add a CI script and some testing scripts to torch-mlir that:

downloads models and test inputs and reference outputs from the official ONNX op test-suite.
- @scotttodd has it converted to mlir and stored in SHARK-TestSuite here
run these test cases
report on CI

renxida commented 4 months ago

https://github.com/nod-ai/onnxruntime/tree/iree_ep/onnxruntime/core/providers/iree

Maybe we can use onnxruntime to directly plug into onnx's tests and not have to write additional data / model preprocessing scripts.

ScottTodd commented 4 months ago

Our downstream project IREE does run good torch node tests, but reports many of the onnx ops that we've lowered as failing. I haven't found a way to view the error messages, and it's also hard to tell whether these failures are due to IREE or torch-mlir.

I archived some historical logs here:

At the time I decided that the full output would be too noisy to include on all CI runs. The list of failures may be small enough now to revise that decision. Generally, you can run pytest with -rA (https://docs.pytest.org/en/stable/how-to/output.html) to see output from XFAIL'd tests, or run with --ignore-xfails (see other custom flags in the conftest.py file).

renxida commented 4 months ago

@rsuderman got some references I could see on how to run torch-mlir and get numerical results w/o using IREE?

mgehre-amd commented 4 months ago

We had some good experience with the onnx.reference evaluator where onnxruntime would lack support for some ops or dtypes (e.g. bfloat16).

vinayakdsci commented 3 months ago

@renxida Hi! When you say that these ops fail, do you expect them to have linalg lowerings?

renxida commented 3 months ago

@vinayakdsci yup! I'm expecting them to work e2e.

In an ideal world, instead of pushing many ops through layer by layer, then coming back to try to push them through the next layer while trying to remember how our old implementations work, I'd like us to push each op through the whole way before moving on to the next thing.

vinayakdsci commented 3 months ago

@renxida I agree :) But I just wanted to point this out that many ops could be failing because of missing torch to linalg lowerings. And don't worry, I am sure we will be able to push them through!

llvm / torch-mlir

Need E2E ONNX op tests in CI #3520

Problem: We don't test onnx ops in our CI

Problems with existing solutions

Proposed solution: