microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.15k stars 2.86k forks source link

[Feature Request] Mark as negative tests for minimal CUDA build #21394

Open poweiw opened 2 months ago

poweiw commented 2 months ago

Describe the feature request

Mark tests that are expected to fail in the minimal CUDA build.

Describe scenario use case

Some tests are expected to fail while using the minimal CUDA EP by compiling with -Donnxruntime_CUDA_MINIMAL. Since the CUDA "EP" in result is more of an utility library for memory allocations etc, ops are not expected to run directly with the minimal CUDA EP. Should we mark those tests as negative tests if USE_CUDA_MINIMAL was defined?

Happy to contribute. Thanks!

skottmckay commented 2 months ago

Typically if you're doing a build with reduced operators the simplest thing to do is use --skip_tests.

I expect the amount of time it would take to track and maintain lists of tests that are expected to pass/fail would far outweigh any benefit. For reference, there are currently over 4,000 tests in onnxruntime_test_all.

poweiw commented 2 months ago

For context we try to build up a pipeline for building/testing onnxruntime, and it could be confusing if we have to check the logs to see if all the failed tests are expected. Does the normal CUDA EP runs through all 4000 tests?

skottmckay commented 2 months ago

Yes it does.

The majority of tests in onnruntime_test_all loop through all the execution providers that are enabled in the build as they test the individual onnx operators with different input values and different opsets. There are also other things like tests for the optimizers or regression tests that use models from the testdata directory.

poweiw commented 2 months ago

Is it going to be non-trivial to do something like #ifdef USE_CUDA_MINIMAL -> don't run CUDA EP in onnxruntime_test_all? For the minimal cuda build, TRT EP will use utilities in the CUDA EP so they should be reasonably tested in that case as well.

skottmckay commented 2 months ago

You could certainly try adding an ifdef around this line:

https://github.com/microsoft/onnxruntime/blob/2580d935cbecd756cef435fb173a2f10237e9d2a/onnxruntime/test/providers/base_tester.cc#L642

Given we do a similar thing for NHWC CUDA ops it doesn't seem unreasonable to use the same approach for a CUDA minimal build.