1 - onnxruntime_test_all fails when building with OpenVINO support

drux007 commented 2 years ago

Describe the bug When I try to build onnxruntime with OpenVINO support, as instructed in the official instructions, 1 - onnxruntime_test_all fails.

Urgency Standard.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: 1.9.1
Python version: 3.8.10
GCC/Compiler version (if compiling from source): 9.4.0
OpenVINO version: 2021.4.1 LTS (latest)

To Reproduce As I <hardware_option> I specify MYRIAD_FP16 since I attend to build it for Intel NCS2 which is listed as MYRIAD_FP16 according to the Build instructions.

The exactc ommand I use is:

./build.sh --config RelWithDebInfo --use_openvino MYRIAD_FP16 --build_shared_lib

Expected behavior It is expected that all tests pass successfully, but only 83% tests passed, 1 tests failed out of 6.

skottmckay commented 2 years ago

Output from the failure would be helpful.

skottmckay commented 2 years ago

Do you have output from the tests within onnxruntime_test_all that failed? Wondering if there is anything helpful in /media/msaponja/Linux1/onnxruntime/build/Linux/RelWithDebInfo/Testing/Temporary/LastTest.log?

Alternatively you could change to the build output directory and run onnxruntime_test_all directly as it's an executable. It needs to be run from the directory it's located in to load test data.

drux007 commented 2 years ago

Thank you for your help and quick answer @skottmckay.

Do you have output from the tests within onnxruntime_test_all that failed? Wondering if there is anything helpful in /media/msaponja/Linux1/onnxruntime/build/Linux/RelWithDebInfo/Testing/Temporary/LastTest.log?

In LastTest.log there are 14970 lines with 1202 errors so didn't want to bother you with it, but here it is: LastTest.log.

Alternatively you could change to the build output directory and run onnxruntime_test_all directly as it's an executable.

Or if easier, here is also the output of onnxruntime_test_all executable: output_onnxruntime_test_all.txt.

skottmckay commented 2 years ago

@jywu-msft do you know why so many tests (158 in total) would be failing with floating point accuracy issues?

e.g.

[ RUN ] SoftmaxOperator.Simple /media/msaponja/Linux1/onnxruntime/onnxruntime/test/providers/provider_test_utils.cc:241: Failure The difference between expected[i] and output[i] is 0.00014457106590270996, which exceeds threshold, where expected[i] evaluates to 0.24472847580909729, output[i] evaluates to 0.244873046875, and threshold evaluates to 9.9999997473787516e-05. i:1, provider_type: OpenVINOExecutionProvider

drux007 commented 2 years ago

Also, here is the output if I run executable with --rerun-failed --output-on-failure options as suggested from the build log: output_onnxrutnime_test_all_output_on_failure.txt

jywu-msft commented 2 years ago

myriad is expected to have accuracy issues since it can only execute fp16. i thought we had adjusted the test accuracy tolerance at some point. +@maajidkhan for any comments

chausner-audeering commented 2 years ago

https://github.com/microsoft/onnxruntime/commit/b7129305be21111ba9d503019e8b0ede9c6834e3 seems to have disabled tests when building for MYRIAD and building NuGet packages:

    # Disabling unit tests for GPU and MYRIAD on nuget creation
    if args.use_openvino != "CPU_FP32" and args.build_nuget:
        args.test = False

Is the reduced precision of MYRIAD the reason why the tests are disabled because they would otherwise fail? If yes, why are they only disabled if building NuGet packages and not always? Does it mean the tests are expected to fail when building for MYRIAD?

/cc @MaajidKhan

MaajidKhan commented 2 years ago

Yes @chausner-audeering @drux007. Yes the failures with MYRIAD_FP16 are expected due to accuracy mismatch as slight accuracy change is expected with fp32 to fp16 conversion and this the reason why we put this condition while building nuget.

And with CPU_FP32, it is general rule that the output should match with the actual reference output and there can't be any mismatch because it's only FP32 precision.

chausner-audeering commented 2 years ago

Yes the failures with MYRIAD_FP16 are expected due to accuracy mismatch as slight accuracy change is expected with fp32 to fp16 conversion and this the reason why we put this condition while building nuget.

Okay, thanks for clarifying. But shouldn't then the tests be always deactivated if building for something other than CPU_FP32, irrespective of whether NuGet builds are enabled or not? I don't quite see how it's related to NuGet.

MaajidKhan commented 2 years ago

We disabled tests for Nuget build with FP16 precision because the build fails when some of the tests fails during the nuget build for OpenVINO-EP. so this change was explicitly for Nuget.

However, we don't want to disable the tests by default to usual regular OpenVINO EP builds for the devices with FP16 precision because these tests are not genuine failures. These are just accuracy mismatches.

so, I would suggest you to use a additional flag --skip_tests in your build command when you build OpenVINO EP with MYRIAD_FP16. This will make sure the tests are not run at the end of your build and the build gets completed successfully.

chausner commented 2 years ago

However, we don't want to disable the tests by default to usual regular OpenVINO EP builds for the devices with FP16 precision because these tests are not genuine failures. These are just accuracy mismatches.

I see. Still, I think the current solution not ideal:

It is not obvious that these tests are expected to fail for MYRIAD_FP16. As long as this fact is not documented somewhere (correct me if I am wrong), people might continue to report it as an issue in the future.
Even when knowing that the accuracy tests are expected to fail, I am not aware of any easy way to check that there are no other tests failing which are not related to floating-point accuracy. And if there is no practical way to check this, why even bother running the tests then?

A better solution could be to disable only those tests that are related to floating-point accuracy/expected to fail, or to increase the test thresholds for MYRIAD_FP16 so that the tests also pass there.

skottmckay commented 2 years ago

@MaajidKhan As a long term solution can the TensorCheck helper in the unit tests be updated to take the current EP as an input and adjust tolerances instead? Disabling the tests seems like a good way to miss issues.

That would be better than the current ugly setup where we hardcode tolerances in multiple places and apply them to all tests (and not just ones where the particular EP is used):

https://github.com/microsoft/onnxruntime/blob/66ceb6926df6c7a81dc1f241c6bf59cfe58d3a72/onnxruntime/test/providers/provider_test_utils.cc#L167-L170

https://github.com/microsoft/onnxruntime/blob/66ceb6926df6c7a81dc1f241c6bf59cfe58d3a72/onnxruntime/test/providers/provider_test_utils.cc#L225-L229

https://github.com/microsoft/onnxruntime/blob/66ceb6926df6c7a81dc1f241c6bf59cfe58d3a72/onnxruntime/test/providers/provider_test_utils.cc#L291-L294

drux007 commented 2 years ago

But shouldn't then the tests be always deactivated if building for something other than CPU_FP32, irrespective of whether NuGet builds are enabled or not?

As a temporary solution, we could adjust the code to always skip tests when using "CPU_FP32" and not only on nuget creation, i.e. changing lines to:

    # Disabling unit tests for GPU and MYRIAD
    if args.use_openvino != "CPU_FP32":
        args.test = False

I adjusted it accordingly and the build finished successfully. It's not ideal, but still better than simply skipping tests with an additional flag --skip_tests which is not even documented somewhere (correct me if I am wrong).

MaajidKhan commented 2 years ago

@drux007 I will work with our team internally and see if we can enable the change temporarily from your comment above.

@skottmckay Yeah, that would be a nice change to have. Meanwhile, we have to figure out what would be the ideal tolerance rate setting to avoid accuracy mismatches w.r.t MYRIAD_FP16. once we get there, TensorCheck helper can be modified to accept EP Type and set the right threshold w.r.t the EP's.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime

1 - onnxruntime_test_all fails when building with OpenVINO support #9439