[MPS] Performance regression happened for some models on MPS backend

Christywl commented 4 years ago

Test Env: Chromium Version: nightly build 79.0.3917.0(https://github.com/otcshare/chromium-src/commit/270f639f9ab6be2eeeb86fb9ab82930cf94fb60a) Platform: macOS 10.14.5

Expected Result: No regression was found.

Actual Result: Performance regression happened for some models on MPS backend, some commit between https://github.com/otcshare/chromium-src/commit/4e4a1a3c08ac0dc84d9d313c50f14e56c82dca8c and https://github.com/otcshare/chromium-src/commit/270f639f9ab6be2eeeb86fb9ab82930cf94fb60a caused this issue:

Models	9642fde(M75)	4e4a1a3(M79 befroe #997 fixed)	270f693(M79 after fixed #997)
MobileNet v1(TFLite)	11.93+-1.49	#997	16.06+-1.30
MobileNet v2(TFLite)	13.49+-1.06	#997	16.45+-0.83
SqueezeNet(TFLite)	13.49+-1.61	#997	17.96+-0.98
MobileNet v2(ONNX)	14.31+-1.31	#997	18.45+-0.90
MobileNet v1(OpenVINO)	12.04+-1.34	#997	15.66+-0.95
MobileNet v2(OpenVINO)	13.55+-1.45	#997	16.90+-0.66
Inception v2(OpenVINO)	20.41+-1.93	20.41+-1.99	30.12+-1.47
Object Detection/SSD MobileNet v1(TFLite)	21.25+-3.09 Decode Time: 0.33+-0.34	20.45+-2.33 Decode Time: 0.46+-1.14	35.26+-0.56 Decode time: 0.44+-1.04
Object Detection/SSD MobileNet v2(TFLite)	35.18+-0.64 Decode Time: 0.35+-0.35	34.11+-0.82 Decode Time: 0.45+-1.09	39.77+-0.61 Decode time: 0.44+-1.05
Object Detection/SSDLite MobileNet v2(TFLite)	32.63+-1.10 Decode Time: 0.35+-0.33	31.93+-0.75 Decode Time: 0.47+-1.13	35.85+-0.62 Decode time: 0.43+-1.04
PoseNet	14.78+-1.94 Decode Time: 3.04+-1.25	16.64+-0.84 Decode Time: 3.41+-1.58	18.20+-1.35 Decode Time: 3.30+-1.60

How to Reproduce:

Visit https://intel.github.io/webml-polyfill/workload
Select one model mentioned above
Select WebNN in Backend field
Select SUSTAINED_SPEED in Preference filed

fujunwei commented 4 years ago

Allocates a new MTLBuffer and Maps a new memory from SharedBufferMapping of a given length caused the regression, maybe we need to apply the optimization of reducing map new memory to other backend implementation.

huningxin commented 4 years ago

That's a good optimization. Thanks @fujunwei !

maybe we need to apply the optimization of reducing map new memory to other backend implementation.

Do you know which backend this optimization can be used for?

huningxin commented 4 years ago

https://github.com/otcshare/chromium-src/pull/18 merged. @Christywl , please help verify the performance regression issue. Thanks.

fujunwei commented 4 years ago

Do you know which backend this optimization can be used for?

Map new memory including BNNS, CLDNN, DML, DNNL and Inference Engine implementation.

huningxin commented 4 years ago

Map new memory including BNNS, CLDNN, DML, DNNL and Inference Engine implementation.

Great, could you please file an issue to track? Thanks.

fujunwei commented 4 years ago

Done with issue https://github.com/intel/webml-polyfill/issues/1140. Thanks.

Christywl commented 4 years ago

@fujunwei , the performance of these models on the latest build https://github.com/otcshare/chromium-src/commit/775b7014eff6866bcad4921d61b41ac0d1c98d54 still has an issue. Ran several times, the numbers are not stable (including MobileNet v2 models):

Models	Inference Time(ms)	Inference Time(ms)	Inference Time(ms)	Inference Time(ms)	Inference Time(ms)	Inference Time(ms)
MobileNet v1(TFLite)	13.28+-2.59	12.18+-1.16	15.44+-1.31	11.98+-1.08	13.11+-1.26	12.38+-1.37
MobileNet v2(TFLite)	16.77+-1.26	15.64+-1.05	12.29+-1.50	15.66+-0.98	13.04+-1.66	15.07+-0.93
SqueezeNet(TFLite)	12.88+-1.69	16.36+-1.06	13.33+-1.38	16.03+-0.59	15.84+-0.84	13.24+-1.59
MobileNet v2(ONNX)	17.16+-1.66	15.87+-1.21	17.09+-0.71	15.90+-1.16	17.21+-0.79	15.81+-1.11
MobileNet v1(OpenVINO)	14.47+-3.87	12.66+-1.34	11.34+-1.03	15.15+-0.91	12.31+-1.15	12.24+-0.98
MobileNet v2(OpenVINO)	16.36+-0.52	15.59+-1.60	12.18+-1.06	14.39+-1.78	15.73+-0.45	12.47+-1.36
Inception v2(OpenVINO)	20.98+-2.31	22.75+-4.75	20.69+-1.80	23.12+-4.44	20.97+-2.61	20.99+-2.80
SSD MobileNet v1(TFLite)	30.72+-1.05 Decode Time: 0.46+-1.11	21.64+-3.19 Decode Time: 0.45+-1.13	23.58+-2.25 Decode Time: 0.45+-1.11	21.86+-2.65 Decode Time: 0.45+-1.16	31.39+-1.24 Decode Time: 0.44+-1.11	21.30+-3.30 Decode Time: 0.45+-1.08
SSD MobileNet v2(TFLite)	34.98+-2.26 Decode Time: 0.46+-0.95	36.07+1.49 Decode Time: 0.46+-1.12	33.88+1.26 Decode Time: 0.46+-1.14	35.11+0.75 Decode Time: 0.49+-1.10	33.82+0.83 Decode Time: 0.47+-1.14	34.04+1.64 Decode Time: 0.48+-1.11
SSDLite MobileNet v2(TFLite)	33.26+-0.94 Decode Time: 0.45+-1.19	33.41+-1.07 Decode Time: 0.49+-1.12	33.84+-0.54 Decode Time: 0.44+-1.11	32.90+-0.53 Decode Time: 0.48+-1.13	33.80+-0.51 Decode Time: 0.47+-1.11	32.67+-0.56 Decode Time: 0.45+-1.11
PoseNet	16.23+-0.82 Decode time: 3.43+-1.59	17.22+-1.88 Decode time: 3.30+-1.62	16.91+-2.07 Decode time: 3.25+-1.59	17.59+-1.19 Decode time: 3.38+-1.60	16.36+-1.67 Decode time: 3.25+-1.65	16.73+-2.08 Decode time: 3.32+-1.63

While, other models look good, for example:

Models	Inference Time(ms)	Inference Time(ms)	Inference Time(ms)
Inception v3(TFLite)	43.47+-2.42	43.67+-2.61	43.78+-2.70
Inception v4(TFLite)	79.51+-3.04	79.86+-3.08	79.87+-3.14
SqueezeNet(ONNX)	12.03+-2.06	11.93+-0.70	11.83+-0.66
Resnet50 v1(ONNX)	34.35+-2.15	34.36+-1.99	34.36+-2.07
SqueezeNet(OpenVINO)	12.05+-1.73	11.93+-0.66	11.92+-0.69

So I reopen this issue, please take a look.

intel / webml-polyfill

[MPS] Performance regression happened for some models on MPS backend #1119