Which would be the right accuracy for Web ML performance score?

BruceDai commented 6 years ago

Currently we use 1e-6f as accuracy for Web ML performance score, while Android NN CTS uses 1e-5f or 5.0f * 0.0009765625f for relaxed mode from executeWithCompilation function in https://android.googlesource.com/platform/frameworks/ml/+/master/nn/runtime/test/GeneratedUtils.cpp


  Line 89:     void executeWithCompilation(Model* model, Compilation* compilation,
  Line 90:                                 std::function<bool(int)> isIgnored,
  Line 91:                                 std::vector<MixedTypedExample>& examples,
  Line 92:                                 std::string dumpFile) {
  ...
  Line 101:         // If in relaxed mode, set the error range to be 5ULP of FP16.
  Line 102:         float fpRange = !model->isRelaxed() ? 1e-5f : 5.0f * 0.0009765625f;
  ...
  Line 146:             compare(filteredGolden, filteredTest, fpRange);

Which would be the right accuracy for Web ML, 1e-6f or other to be defined?

@halton @huningxin what are your opinions, thanks.

halton commented 6 years ago

Can I know how the "1e-6f as accuracy for Web ML performance score" come from?

BruceDai commented 6 years ago

@halton About the accuracy of "1e-6f", I synced it with @huningxin last, he said that he designed that accuracy by himself. And which would be the right accuracy, we need a discussion.

Christywl commented 6 years ago

The behavior of some CTS converted from Android CTS by WebML + MPS is different from WebML + NNAPI.

Tests: conv_1_h3_w2_SAME.js conv_1_h3_w2_VALID.js conv_3_h3_w2_SAME.js conv_3_h3_w2_VALID.js depthwise_conv.js depthwise_conv2d_float_large_2.js

Result for conv_1_h3_w2_SAME.js:

Index	Mac	Android	Expected
0	1.8525390625	1.852844476699829	1.85284
1	-0.03955078125	-0.03936517983675003	-0.0393656
2	-0.126953125	-0.12735417485237122	-0.127353
3	1.4296875	1.431152105331421	1.43115
4	-0.301025390625	-0.3022947907447815	-0.302294
5	-1.0390625	-1.040202260017395	-1.0402
6	-0.65478515625	-0.65478515625	0.655023
Result	Fail	Pass

ibelem commented 5 years ago

Use e^-5 to check if bugs on Mac.

huningxin commented 5 years ago

I tested conv_1_h3_w2_SAME.js on https://github.com/otcshare/chromium-src/commit/4dc45a2e8b0e1fe2879893b303bbbce5728296c6 (with fix https://github.com/otcshare/chromium-src/pull/17) on MacOS 10.13.6. I cannot reproduce this issue. The op3_output data is:

0: 1.852844476699829
1: -0.03936518356204033
2: -0.12735414505004883
3: 1.431152105331421
4: -0.3022948205471039
5: -1.0402021408081055
6: 0.6550236940383911

huningxin commented 5 years ago

@BruceDai , it also works for me on MacBook Pro. Please take another look.

Christywl commented 5 years ago

@huningxin , I think you run the test with BNNS backend. conv_1_h3_w2_SAME.js also passes for me with BNNS backend. It fails with MPS backend.

Update the detailed results with the latest Chromium nightly build(8fcce67). Test Env:

Device: MacBook Pro(10.13.6)
CPU: i5-5257U
GPU: Intel Iris 6100(GL_VERSION: 4.1 INTEL-10.36.19)

Failed Tests for MPS: https://brucedai.github.io/nt/testm/cts-all.html?backend=mps

check result for Conv 1 h3 w2 same example-1
check result for Conv 1 h3 w2 same example-2
check result for Conv 1 h3 w2 valid example-1
check result for Conv 1 h3 w2 valid example-2
check result for Conv 3 h3 w2 same example-1
check result for Conv 3 h3 w2 same example-2
check result for Conv 3 h3 w2 valid example-1
check result for Conv 3 h3 w2 valid example-2
check result for Depthwise conv example-1
check result for Depthwise conv example-2
check result for Depthwise conv2d float large example/2

Failed Tests for BNNS: https://brucedai.github.io/nt/testm/cts-all.html?backend=bnns

check result for Conv 3 h3 w2 same example-1
check result for Conv 3 h3 w2 same example-2
check result for Conv 3 h3 w2 valid example-1
check result for Conv 3 h3 w2 valid example-2
check result for Conv float large example
check result for Conv float large weights as inputs example

huningxin commented 5 years ago

@Christywl , you are correct. I can reproduce this issue with MPS backend now. I will look into the root cause. Thanks!

huningxin commented 5 years ago

As MPS computes on FP16, we need to use 5ULP of FP16 range, see https://android.googlesource.com/platform/hardware/interfaces/+/master/neuralnetworks/1.0/vts/functional/GeneratedTestHarness.cpp#268

// If in relaxed mode, set the error range to be 5ULP of FP16.
float fpRange = !model.relaxComputationFloat32toFloat16 ? 1e-5f : 5.0f * 0.0009765625f;
EvaluatePreparedModel(preparedModel, is_ignored, examples, fpRange);

In my test, above tests can pass on MPS with 5ULP of FP16 range.

@BruceDai , please verify and update test cases. Thanks!

Christywl commented 5 years ago

Using the updated tests, all of them passed on MPS except for check result for Depthwise conv2d float large example/2(#189). For the failed test on BNNS, please see #191.

huningxin commented 5 years ago

ULP of PF32: 1.1920928955078125e-7f = 2 ^ -23 ULP of FP16: 0.0009765625f = 2 ^ -10

intel / webml-polyfill

Which would be the right accuracy for Web ML performance score? #65