onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
741 stars 314 forks source link

Test suites #835

Open sstamenova opened 3 years ago

sstamenova commented 3 years ago

Right now there are several different types of tests in onnx-mlir:

1) The lit tests in test/mlir 2) The numerical tests in test/numerical 3) The backend tests in test/backend 4) The unit tests in test/unit and test/onnx2mlir which do not require an onnx install 5) The one unit test in test/unit/Runtime/DocExampleTest that does require an onnx install 6) The doc tests in doc/doc-check

Some of these tests have their own targets (such as check-doc, check-onnx-lit), though some are executed by directly invoking the test target which runs all ctests. This makes it difficult to have a clean and consistent way to work cross-platform and several of the tests still fail on some platforms (most notably Windows).

The tests can be separated into several categories:

I'd like to propose that we re-organize the tests a little bit so that they at least have their own targets (see https://github.com/onnx/onnx-mlir/issues/563 for a request for a separate target for the long-running numerical tests). We could also introduce other improvements such as using gtest so that we can leverage lit for more of the tests.

I would like to start with a couple of changes:

  1. Add check-docs (or some other name of our choosing):

    • Move the Runtime/DocExampleTest which requires onnx install to doc/doc-example
    • Exclude the building of the DocExampleTest binary from ALL - since it requires an onnx install it creates problems for the ALL target (see https://github.com/onnx/onnx-mlir/issues/561)
    • Add a check-docs target which runs both the existing check-doc test and the doc example test This makes sense because both the DocExampleTest and check-doc are intended to make sure that the docs are up-to-date and provide examples. Both also require an onnx install.
  2. Add check-unittest (or some other name of our choosing):

    • Add all remaining unit tests (that are not the numerical tests) to a single target that can execute them by leveraging a ctest label This makes sense too because all of these tests require a simple compilation and then execution of the binary that was created, they all run in 30 seconds or less and do not require onnx.
  3. Modify all of the CI flow to use these new targets (along with the already existing check-numerical target which is used on Windows but not in the other CI flows) This would allow us to run all of the tests based on their targets only after the required dependencies are installed and it will test that the new targets work correctly

I think for 2 (check-unittest), it would also make sense to make those test gtests and use lit to run them. I am not sure whether the numerical tests can similarly by run using gtest and lit, but that would make reporting of the results on a run cleaner as well.

@caoimhinuibrian @gongsu832 @AlexandreEichenberger What do you think?

AlexandreEichenberger commented 3 years ago

Separating tests that require to build ONNX vs the others probably make sense. Having them organized in logical groups also sense.

I understand that there is a desire to use more of the lit tests. However, for complicated operations like matmul and conv, there is really no substitute to testing than using large random parameter testings. For examples, CONV has image size (I1, I2), Kernel size (K1, K2), pads (P0..P3 or SAME/VALID), strides (s1, s2), dilation (d1, d2). Then you add algorithms that does memory/register tiling with algo that decides what amount depending on input sizes... There is no way we can do lit tests that cover this space. I know it takes some time to run them, and maybe as the compiler matures we may be able to reduce the number of tests.

Note also that using the backend tests in various different configuration is useful: for a compiler, whether the input sizes are compile time or runtime makes a big difference. In addition, wether some constant inputs are known at compile time (thus triggering constant propagations of weights) or at runtime also is very different. Backend tests are also very limited for complicated ops (in terms of parameters and optimizations applied) such as CONV, MatMul, GEMM, and the RNNs.

Thank you for taking a big picture look at our tests and looking at simplifying how they are called and applied.

@caoimhinuibrian @gongsu832 you have more expertise with the testing mechanics, would be great to have your input.

sstamenova commented 3 years ago

@AlexandreEichenberger : I'll send a couple of PRs to separate the tests into their own logical groups.

As for the lit tests, I was not suggesting we remove any of the existing tests or that the same style of lit tests that are in the repo will be sufficient. lit is capable of running gtests as well as executables (and lots of other custom tests, see the lldb test suite), so it should be possible to take the existing tests mostly as they are and have lit run them. This won't change what the tests verify, but it would change the reporting and calling convention - rather than having lit report some results and ctest report some results, all the results could be reported together. And, unless there's reason to execute each test suite separately, we could have a target that runs all of the tests, or all of the tests that don't require onnx, etc. and get all the results together. Does that make sense?