Closed BruceDai closed 5 years ago
Can I know how the "1e-6f as accuracy for Web ML performance score" come from?
@halton About the accuracy of "1e-6f", I synced it with @huningxin last, he said that he designed that accuracy by himself. And which would be the right accuracy, we need a discussion.
The behavior of some CTS converted from Android CTS by WebML + MPS is different from WebML + NNAPI.
Tests: conv_1_h3_w2_SAME.js conv_1_h3_w2_VALID.js conv_3_h3_w2_SAME.js conv_3_h3_w2_VALID.js depthwise_conv.js depthwise_conv2d_float_large_2.js
Result for conv_1_h3_w2_SAME.js:
Index | Mac | Android | Expected |
---|---|---|---|
0 | 1.8525390625 | 1.852844476699829 | 1.85284 |
1 | -0.03955078125 | -0.03936517983675003 | -0.0393656 |
2 | -0.126953125 | -0.12735417485237122 | -0.127353 |
3 | 1.4296875 | 1.431152105331421 | 1.43115 |
4 | -0.301025390625 | -0.3022947907447815 | -0.302294 |
5 | -1.0390625 | -1.040202260017395 | -1.0402 |
6 | -0.65478515625 | -0.65478515625 | 0.655023 |
Result | Fail | Pass |
Use e^-5 to check if bugs on Mac.
I tested conv_1_h3_w2_SAME.js on https://github.com/otcshare/chromium-src/commit/4dc45a2e8b0e1fe2879893b303bbbce5728296c6 (with fix https://github.com/otcshare/chromium-src/pull/17) on MacOS 10.13.6. I cannot reproduce this issue. The op3_output data is:
0: 1.852844476699829
1: -0.03936518356204033
2: -0.12735414505004883
3: 1.431152105331421
4: -0.3022948205471039
5: -1.0402021408081055
6: 0.6550236940383911
@BruceDai , it also works for me on MacBook Pro. Please take another look.
@huningxin , I think you run the test with BNNS backend. conv_1_h3_w2_SAME.js also passes for me with BNNS backend. It fails with MPS backend.
Update the detailed results with the latest Chromium nightly build(8fcce67). Test Env:
Device: MacBook Pro(10.13.6)
CPU: i5-5257U
GPU: Intel Iris 6100(GL_VERSION: 4.1 INTEL-10.36.19)
Failed Tests for MPS: https://brucedai.github.io/nt/testm/cts-all.html?backend=mps
Failed Tests for BNNS: https://brucedai.github.io/nt/testm/cts-all.html?backend=bnns
@Christywl , you are correct. I can reproduce this issue with MPS backend now. I will look into the root cause. Thanks!
As MPS computes on FP16, we need to use 5ULP of FP16 range, see https://android.googlesource.com/platform/hardware/interfaces/+/master/neuralnetworks/1.0/vts/functional/GeneratedTestHarness.cpp#268
// If in relaxed mode, set the error range to be 5ULP of FP16.
float fpRange = !model.relaxComputationFloat32toFloat16 ? 1e-5f : 5.0f * 0.0009765625f;
EvaluatePreparedModel(preparedModel, is_ignored, examples, fpRange);
In my test, above tests can pass on MPS with 5ULP of FP16 range.
@BruceDai , please verify and update test cases. Thanks!
Using the updated tests, all of them passed on MPS except for check result for Depthwise conv2d float large example/2
(#189).
For the failed test on BNNS, please see #191.
ULP of PF32: 1.1920928955078125e-7f
= 2 ^ -23
ULP of FP16: 0.0009765625f
= 2 ^ -10
Currently we use 1e-6f as accuracy for Web ML performance score, while Android NN CTS uses 1e-5f or 5.0f * 0.0009765625f for relaxed mode from executeWithCompilation function in https://android.googlesource.com/platform/frameworks/ml/+/master/nn/runtime/test/GeneratedUtils.cpp
Which would be the right accuracy for Web ML, 1e-6f or other to be defined?
@halton @huningxin what are your opinions, thanks.