intel / webml-polyfill

Deprecated, the Web Neural Network Polyfill project has been moved to https://github.com/webmachinelearning/webnn-polyfill
Apache License 2.0
160 stars 42 forks source link

[MPS] Performance regression happened for some models on MPS backend #1119

Open Christywl opened 4 years ago

Christywl commented 4 years ago

Test Env: Chromium Version: nightly build 79.0.3917.0(https://github.com/otcshare/chromium-src/commit/270f639f9ab6be2eeeb86fb9ab82930cf94fb60a) Platform: macOS 10.14.5

Expected Result: No regression was found.

Actual Result: Performance regression happened for some models on MPS backend, some commit between https://github.com/otcshare/chromium-src/commit/4e4a1a3c08ac0dc84d9d313c50f14e56c82dca8c and https://github.com/otcshare/chromium-src/commit/270f639f9ab6be2eeeb86fb9ab82930cf94fb60a caused this issue:

Models 9642fde(M75) 4e4a1a3(M79 befroe #997 fixed) 270f693(M79 after fixed #997)
MobileNet v1(TFLite) 11.93+-1.49 #997 16.06+-1.30
MobileNet v2(TFLite) 13.49+-1.06 #997 16.45+-0.83
SqueezeNet(TFLite) 13.49+-1.61 #997 17.96+-0.98
MobileNet v2(ONNX) 14.31+-1.31 #997 18.45+-0.90
MobileNet v1(OpenVINO) 12.04+-1.34 #997 15.66+-0.95
MobileNet v2(OpenVINO) 13.55+-1.45 #997 16.90+-0.66
Inception v2(OpenVINO) 20.41+-1.93 20.41+-1.99 30.12+-1.47
Object Detection/SSD MobileNet v1(TFLite) 21.25+-3.09 Decode Time: 0.33+-0.34 20.45+-2.33 Decode Time: 0.46+-1.14 35.26+-0.56  Decode time: 0.44+-1.04
Object Detection/SSD MobileNet v2(TFLite) 35.18+-0.64 Decode Time: 0.35+-0.35 34.11+-0.82 Decode Time: 0.45+-1.09 39.77+-0.61  Decode time: 0.44+-1.05
Object Detection/SSDLite MobileNet v2(TFLite) 32.63+-1.10 Decode Time: 0.35+-0.33 31.93+-0.75 Decode Time: 0.47+-1.13 35.85+-0.62  Decode time: 0.43+-1.04
PoseNet 14.78+-1.94 Decode Time: 3.04+-1.25 16.64+-0.84 Decode Time: 3.41+-1.58 18.20+-1.35 Decode Time: 3.30+-1.60

How to Reproduce:

  1. Visit https://intel.github.io/webml-polyfill/workload
  2. Select one model mentioned above
  3. Select WebNN in Backend field
  4. Select SUSTAINED_SPEED in Preference filed
fujunwei commented 4 years ago

Allocates a new MTLBuffer and Maps a new memory from SharedBufferMapping of a given length caused the regression, maybe we need to apply the optimization of reducing map new memory to other backend implementation.

huningxin commented 4 years ago

That's a good optimization. Thanks @fujunwei !

maybe we need to apply the optimization of reducing map new memory to other backend implementation.

Do you know which backend this optimization can be used for?

huningxin commented 4 years ago

https://github.com/otcshare/chromium-src/pull/18 merged. @Christywl , please help verify the performance regression issue. Thanks.

fujunwei commented 4 years ago

Do you know which backend this optimization can be used for?

Map new memory including BNNS, CLDNN, DML, DNNL and Inference Engine implementation.

huningxin commented 4 years ago

Map new memory including BNNS, CLDNN, DML, DNNL and Inference Engine implementation.

Great, could you please file an issue to track? Thanks.

fujunwei commented 4 years ago

Done with issue https://github.com/intel/webml-polyfill/issues/1140. Thanks.

Christywl commented 4 years ago

@fujunwei , the performance of these models on the latest build https://github.com/otcshare/chromium-src/commit/775b7014eff6866bcad4921d61b41ac0d1c98d54 still has an issue. Ran several times, the numbers are not stable (including MobileNet v2 models):

Models Inference Time(ms) Inference Time(ms) Inference Time(ms) Inference Time(ms) Inference Time(ms) Inference Time(ms)
MobileNet v1(TFLite) 13.28+-2.59 12.18+-1.16 15.44+-1.31 11.98+-1.08 13.11+-1.26 12.38+-1.37
MobileNet v2(TFLite) 16.77+-1.26 15.64+-1.05 12.29+-1.50 15.66+-0.98 13.04+-1.66 15.07+-0.93
SqueezeNet(TFLite) 12.88+-1.69 16.36+-1.06 13.33+-1.38 16.03+-0.59 15.84+-0.84 13.24+-1.59
MobileNet v2(ONNX) 17.16+-1.66 15.87+-1.21 17.09+-0.71 15.90+-1.16 17.21+-0.79 15.81+-1.11
MobileNet v1(OpenVINO) 14.47+-3.87 12.66+-1.34 11.34+-1.03 15.15+-0.91 12.31+-1.15 12.24+-0.98
MobileNet v2(OpenVINO) 16.36+-0.52 15.59+-1.60 12.18+-1.06 14.39+-1.78 15.73+-0.45 12.47+-1.36
Inception v2(OpenVINO) 20.98+-2.31 22.75+-4.75 20.69+-1.80 23.12+-4.44 20.97+-2.61 20.99+-2.80
SSD MobileNet v1(TFLite) 30.72+-1.05 Decode Time: 0.46+-1.11 21.64+-3.19 Decode Time: 0.45+-1.13 23.58+-2.25 Decode Time: 0.45+-1.11 21.86+-2.65 Decode Time: 0.45+-1.16 31.39+-1.24 Decode Time: 0.44+-1.11 21.30+-3.30 Decode Time: 0.45+-1.08
SSD MobileNet v2(TFLite) 34.98+-2.26 Decode Time: 0.46+-0.95 36.07+1.49 Decode Time: 0.46+-1.12 33.88+1.26 Decode Time: 0.46+-1.14 35.11+0.75 Decode Time: 0.49+-1.10 33.82+0.83 Decode Time: 0.47+-1.14 34.04+1.64 Decode Time: 0.48+-1.11
SSDLite MobileNet v2(TFLite) 33.26+-0.94 Decode Time: 0.45+-1.19 33.41+-1.07 Decode Time: 0.49+-1.12 33.84+-0.54 Decode Time: 0.44+-1.11 32.90+-0.53 Decode Time: 0.48+-1.13 33.80+-0.51 Decode Time: 0.47+-1.11 32.67+-0.56 Decode Time: 0.45+-1.11
PoseNet 16.23+-0.82  Decode time: 3.43+-1.59 17.22+-1.88  Decode time: 3.30+-1.62 16.91+-2.07  Decode time: 3.25+-1.59 17.59+-1.19  Decode time: 3.38+-1.60 16.36+-1.67  Decode time: 3.25+-1.65 16.73+-2.08  Decode time: 3.32+-1.63

While, other models look good, for example:

Models Inference Time(ms) Inference Time(ms) Inference Time(ms)
Inception v3(TFLite) 43.47+-2.42 43.67+-2.61 43.78+-2.70
Inception v4(TFLite) 79.51+-3.04 79.86+-3.08 79.87+-3.14
SqueezeNet(ONNX) 12.03+-2.06 11.93+-0.70 11.83+-0.66
Resnet50 v1(ONNX) 34.35+-2.15 34.36+-1.99 34.36+-2.07
SqueezeNet(OpenVINO) 12.05+-1.73 11.93+-0.66 11.92+-0.69

So I reopen this issue, please take a look.