[Investigation] Quantized models support

intel / webml-polyfill

Deprecated, the Web Neural Network Polyfill project has been moved to https://github.com/webmachinelearning/webnn-polyfill

Apache License 2.0

160 stars 42 forks source link

[Investigation] Quantized models support #650

Closed huningxin closed 5 years ago

huningxin commented 5 years ago

We need to investigate quantized model support for WebNN API.

Some TODOs in my mind:

Support quantized version of operations in WASM backend
Modify TFLite model loader to support quantized model.
Test quantized model (e.g. Mobilenet_V2_1.0_224_quant) of TFLite model zoo.
Investigate the gaps in Chromium POC with various backends. Start with Android NNAPI backend.

Wenzhao-Xiang commented 5 years ago

I first supported quantized version of Conv2D, DepthwiseConv2D, Averagepool, Reshape and Softmax in WASM backend, and modified the TFLite model loader to support quantized model. Then I tried Mobilenet_V2_1.0_224_quant model for image classification. And here is the result:

Quantized model:

Float model:

The class seems to be right, but still need to do some post-process with the probability. Another strange thing is that quantized model is slower than float model. Maybe I need some quantized version of test case to test if the ops work as expect.

huningxin commented 5 years ago

Good progress! Please refer to TFLite demo code for post-processing. We need to make sure the result is correct at first step. Thanks!

Wenzhao-Xiang commented 5 years ago

Found Mobilenet_V2_1.0_224_quant model have no softmax, so add it automatically. After post-process, get the following result:

huningxin commented 5 years ago

That's great, thanks @Wenzhao-Xiang !

We need to verify two aspects:

what's the result when running on WebNN/NNAPI? @BruceDai , please help on this.
what's the reason that quantized model runs slower than float in WASM ops?

Wenzhao-Xiang commented 5 years ago

@huningxin Test Env: Chromium Version: nightly build 70.0.3503.0 (a7c5589) Platform: Android 9.0(Google Pixel 2XL)

Quantized model (Mobilenet_v2):

Float model (Mobilenet_v2):

Have about 2x speed up on WebNN/NNAPI.

I will next investigate why quantized model runs slower than float in WASM ops.

huningxin commented 5 years ago

The float model number seems slower than what we collected previously. @Wenzhao-Xiang , could you please double-check with @BruceDai ?

And according to ai benchmark, pixel 3 has significant speedup on int8 quantized model. Please help test on that device. Thanks!

Wenzhao-Xiang commented 5 years ago

@huningxin Test with benchmark:

Chromium Version: nightly build 70.0.3503.0 (a7c5589) Platform: Android 9.0(Google Pixel 2XL) Quantized Mobilenet_V2: Inference Time: 79.18+-25.28 [ms] Float Mobilenet_V2: Inference Time: 109.41+-21.64 [ms]

Chromium Version: nightly build 70.0.3503.0 (a7c5589) Platform: Android 9.0(Google Pixel 3) Quantized Mobilenet_V2: Inference Time: 8.89+-1.25 [ms] Float Mobilenet_V2: Inference Time: 105.37+-20.90 [ms]

Summary: About 1.38x speed up in Google Pixel 2XL About 10x speed up in Google Pixel 3

Wenzhao-Xiang commented 5 years ago

Also supported ssd_mobilenet_v1_quant for object detection. Here is the test with examples: (Google Pixel 3) Image Classification:

Object Detection:

Really have a significant speedup with Google Pixel 3, even faster than WebGL backend with 1080TI, amazing me!

huningxin commented 5 years ago

Impressed! Great job @Wenzhao-Xiang !

Please follow up as we discussed:

publish a demo URL for testing (host both f32 model and int8 model).
investigate the WASM polyfill support
make a PR to support int8 model and WASM polyfill

Thanks!

Wenzhao-Xiang commented 5 years ago

Please go to here for image classification demo(both f32 model and uint8 model, but not include inception v3/v4 and inception resnet v2).

Please go to here for object detection demo(both f32 model and uint8 model).

Now the quantized models are only supported on WASM and NNAPI backend.

Wenzhao-Xiang commented 5 years ago

Done. Close it.