intel / webml-polyfill

Deprecated, the Web Neural Network Polyfill project has been moved to https://github.com/webmachinelearning/webnn-polyfill
Apache License 2.0
161 stars 46 forks source link

Which would be the right accuracy for Web ML performance score? #65

Closed BruceDai closed 5 years ago

BruceDai commented 6 years ago

Currently we use 1e-6f as accuracy for Web ML performance score, while Android NN CTS uses 1e-5f or 5.0f * 0.0009765625f for relaxed mode from executeWithCompilation function in https://android.googlesource.com/platform/frameworks/ml/+/master/nn/runtime/test/GeneratedUtils.cpp


  Line 89:     void executeWithCompilation(Model* model, Compilation* compilation,
  Line 90:                                 std::function<bool(int)> isIgnored,
  Line 91:                                 std::vector<MixedTypedExample>& examples,
  Line 92:                                 std::string dumpFile) {
  ...
  Line 101:         // If in relaxed mode, set the error range to be 5ULP of FP16.
  Line 102:         float fpRange = !model->isRelaxed() ? 1e-5f : 5.0f * 0.0009765625f;
  ...
  Line 146:             compare(filteredGolden, filteredTest, fpRange);

Which would be the right accuracy for Web ML, 1e-6f or other to be defined?

@halton @huningxin what are your opinions, thanks.

halton commented 6 years ago

Can I know how the "1e-6f as accuracy for Web ML performance score" come from?

BruceDai commented 6 years ago

@halton About the accuracy of "1e-6f", I synced it with @huningxin last, he said that he designed that accuracy by himself. And which would be the right accuracy, we need a discussion.

Christywl commented 6 years ago

The behavior of some CTS converted from Android CTS by WebML + MPS is different from WebML + NNAPI.

Tests: conv_1_h3_w2_SAME.js conv_1_h3_w2_VALID.js conv_3_h3_w2_SAME.js conv_3_h3_w2_VALID.js depthwise_conv.js depthwise_conv2d_float_large_2.js

Result for conv_1_h3_w2_SAME.js:

Index Mac Android Expected
0 1.8525390625 1.852844476699829 1.85284
1 -0.03955078125 -0.03936517983675003 -0.0393656
2 -0.126953125 -0.12735417485237122 -0.127353
3 1.4296875 1.431152105331421 1.43115
4 -0.301025390625 -0.3022947907447815 -0.302294
5 -1.0390625 -1.040202260017395 -1.0402
6 -0.65478515625 -0.65478515625 0.655023
Result Fail Pass
ibelem commented 5 years ago

Use e^-5 to check if bugs on Mac.

huningxin commented 5 years ago

I tested conv_1_h3_w2_SAME.js on https://github.com/otcshare/chromium-src/commit/4dc45a2e8b0e1fe2879893b303bbbce5728296c6 (with fix https://github.com/otcshare/chromium-src/pull/17) on MacOS 10.13.6. I cannot reproduce this issue. The op3_output data is:

0: 1.852844476699829
1: -0.03936518356204033
2: -0.12735414505004883
3: 1.431152105331421
4: -0.3022948205471039
5: -1.0402021408081055
6: 0.6550236940383911
huningxin commented 5 years ago

@BruceDai , it also works for me on MacBook Pro. Please take another look.

Christywl commented 5 years ago

@huningxin , I think you run the test with BNNS backend. conv_1_h3_w2_SAME.js also passes for me with BNNS backend. It fails with MPS backend.

Update the detailed results with the latest Chromium nightly build(8fcce67). Test Env:

Device: MacBook Pro(10.13.6)
CPU: i5-5257U
GPU: Intel Iris 6100(GL_VERSION: 4.1 INTEL-10.36.19)

Failed Tests for MPS: https://brucedai.github.io/nt/testm/cts-all.html?backend=mps

Failed Tests for BNNS: https://brucedai.github.io/nt/testm/cts-all.html?backend=bnns

huningxin commented 5 years ago

@Christywl , you are correct. I can reproduce this issue with MPS backend now. I will look into the root cause. Thanks!

huningxin commented 5 years ago

As MPS computes on FP16, we need to use 5ULP of FP16 range, see https://android.googlesource.com/platform/hardware/interfaces/+/master/neuralnetworks/1.0/vts/functional/GeneratedTestHarness.cpp#268

// If in relaxed mode, set the error range to be 5ULP of FP16.
float fpRange = !model.relaxComputationFloat32toFloat16 ? 1e-5f : 5.0f * 0.0009765625f;
EvaluatePreparedModel(preparedModel, is_ignored, examples, fpRange);

In my test, above tests can pass on MPS with 5ULP of FP16 range.

@BruceDai , please verify and update test cases. Thanks!

Christywl commented 5 years ago

Using the updated tests, all of them passed on MPS except for check result for Depthwise conv2d float large example/2(#189). For the failed test on BNNS, please see #191.

huningxin commented 5 years ago

ULP of PF32: 1.1920928955078125e-7f = 2 ^ -23 ULP of FP16: 0.0009765625f = 2 ^ -10