Import ONNX operator tests from SHARK-TestSuite/iree_tests

ScottTodd commented 2 months ago

I'm planning on pulling a limited subset of https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests into this repository. Starting with just the ONNX operator tests.

Skipping for now:

PyTorch models
sharktank models
benchmarks/sdxl
future-pytorch-models

Model coverage can be added back later, ideally sourced from huggingface / kaggle / onnx models, not Azure storage. Model tests may or may not want to use the conftest.py setup that the operator tests used - need to think on that a bit more.

The ONNX operator tests don't require Git LFS or remote file downloading (download_remote_files.py). They can also be tested using ephemeral runners (no need for a prepopulated cache).

Going to clean up parts of https://github.com/nod-ai/SHARK-TestSuite/blob/main/iree_tests/conftest.py and https://github.com/nod-ai/SHARK-TestSuite/blob/main/iree_tests/README.md so the import is cleaner.

May move files in the IREE repository from https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite into a tests/ folder somewhere as part of the switch over.

ScottTodd commented 2 months ago

Staging work on this at https://github.com/ScottTodd/iree-test-suites/tree/onnx-ops. As expected, I found a few areas to simplify while forking the code. One major design choice I'm weighing is whether to keep the conftest.py collection code, or to have the test importer generate stub test.py files that somehow ask the test environment for the list of configurations to generate tests for.

ScottTodd commented 2 months ago

I tried to refactor the tests to rely less on https://docs.pytest.org/en/stable/example/nonpython.html on this branch: https://github.com/ScottTodd/iree-test-suites/tree/onnx-ops-pytest-refactor. The goals there are to have some composable pytest primitives that manually authored tests could use too and to make the test runner simpler in general.

Sample test file:

import pytest

compiled_model = ""

@pytest.mark.dependency()
def test_compile(test_iree_compile):
    global compiled_model
    compiled_model = test_iree_compile(__file__, "model.mlir")

@pytest.mark.dependency(depends=["test_compile"])
def test_run(test_iree_run_module):
    global compiled_model
    test_iree_run_module(__file__, compiled_model, "run_module_io_flags.txt")

That then uses test_iree_compile and test_iree_run_module fixtures which themselves use a iree_compile_run_config fixture.

The iree_compile_run_config fixture loads the config file from flags (once per session)
The test_iree_compile fixture returns a function that runs the compile test. The function pulls from the config to fill in config-specific compilation flags and returns the resulting .vmfb file path (if successful)
The test_iree_run_module fixture similarly returns a function that runs the test. The function pulls from the config to fill in config-specific run flags.
A pytest_collection_modifyitems implementation loads the config and sets item "markers" for XFAIL / skip

The missing pieces there are:

Full XFAIL/XPASS handling: particularly with pytest-dependency, if the compile test is XPASS (unexpectedly passed), we still want to run the iree-run-module test to get the observed result there for config file updating. Right now XPASS is a "failure", which stops the dependent test from running.
The update_config_xfails.py script will need reworking
The import_onnx_tests.py script would need to generate those {OP_NAME}_test.py files

ScottTodd commented 2 months ago

@zjgarvey the notes I took here might be relevant to the work you've been doing in https://github.com/nod-ai/SHARK-TestSuite/tree/main/alt_e2eshark. I'm still trying to find the sweet spot between: generated tests, hand-authored tests, developer workflows, and CI workflows.

iree-org / iree-test-suites

Import ONNX operator tests from SHARK-TestSuite/iree_tests #1