intel / webml-polyfill

Deprecated, the Web Neural Network Polyfill project has been moved to https://github.com/webmachinelearning/webnn-polyfill
Apache License 2.0
161 stars 42 forks source link

[WASM] The inference time is larger than WASM(TFLite) for speech models #1262

Closed Christywl closed 4 years ago

Christywl commented 4 years ago

Test Env: webml-polyfill commit: https://github.com/intel/webml-polyfill/commit/a58f9f5340ca87eca4cda18db019f42752317f9e Platform: Windows(Dell XPS 13, Intel i5-8250U)

Actual Result: The inference time is larger than WASM(TFLite) for speech models:

  WASM(TFLite) WASM(TF.js)
Speech Command/KWS DNN(OpenVINO) 0.45 ms 1.48 ms
Speech Recognition/wsj_dnn5b 7924.39 ms 19457.51 ms

How to Reproduce:

  1. Setup server with commit https://github.com/intel/webml-polyfill/commit/a58f9f5340ca87eca4cda18db019f42752317f9e
  2. launch chrome or chromium(disable webml)
  3. Visit http://localhost:8080/examples/
  4. Select Speech Command-->KWS DNN(OpenVINO) or Speech Recognition-->wsj_dnn5b
  5. Check the inference time
akineeic commented 4 years ago

I check the two models and found that both of them are mainly composed of FULLY_CONNECTED. In the tfjs backend FULLY_CONNECTED is executed by tf.matMul which is not a binding operation from native C++ code but implemented by JavaScript in tfjs. So it's slower than the previous tflite wasm backend.