google / lyra

A Very Low-Bitrate Codec for Speech Compression
Apache License 2.0
3.8k stars 354 forks source link

v1.3.2 is much slower than v1.3.1 if it's built into WebAssembly #112

Open sile opened 1 year ago

sile commented 1 year ago

[NOTE] This is just an FYI issue as I know this project doesn't officially support WebAssembly.

As I mentioned in https://github.com/google/lyra/issues/49, shiguredo/lyra-wasm maintains a no-patch WebAssembly build of Lyra. Today, I updated the Lyra version to 1.3.2 (https://github.com/shiguredo/lyra-wasm/pull/10). However, it turned out that the encoding and decoding peformance is degraded after the update.

The following table is a benchmark result from https://shiguredo.github.io/lyra-wasm/lyra-benchmark.html. (elapsed times taken to encode / decode 10 seconds audio data)

Browser Lyra Version Encode Time Decode Time
Chrome (m1 mac) 1.3.1 550.230 ms 804.230 ms
Chrome (m1 mac) 1.3.2 898.375 ms 1144.754 ms
Safari (m1 mac) 1.3.1 596.880 ms 866.779 ms
Safari (m1 mac) 1.3.2 905.639 ms 1168.120 ms
Firefox (m1 mac) 1.3.1 540.199 ms 800.540 ms
Firefox (m1 mac) 1.3.2 609.940 ms 1064.080 ms
Chrome (android) 1.3.1 1002.769 ms 1040.140 ms
Chrome (android) 1.3.2 1398.920 ms 1621.900 ms

I don't know the reason of this performance drop. Any information that helps alleviate this problem is more than welcomed.

mchinen commented 1 year ago

Hi @sile, thank you for the benchmarking (and the nice benchmarking tool) on WASM. Indeed, this is unexpected.
There are some possible causes of the drop in performance on WASM:

We will continue to look into this, but we first need to get set up to benchmark WASM ourselves. Feel free to play with the above settings in the meantime. Let me know if you have any questions or want to chat about this.

sile commented 1 year ago

Thank you for your reply! I am off for a while. So I will look at the detail when I am back to work.

sile commented 1 year ago

Hi @mchinen, thank you again for the detailed advice. I tried some of them, so let me share the result.

There is a possibility that the TFLITE_XNNPACK_DELEGATE_FLAG_QU8 in tflite_model_wrapper.cc flag caused an issue.

This diff seems having huge impact on the performance degradation. I tried reverting this change. Then the encode / decode time of the patched v1.3.2 became comparable to v1.3.1 (see the table below).

Browser Lyra Version Encode Time Decode Time
Chrome (m1 mac) 1.3.1 570.260 ms 821.094 ms
Chrome (m1 mac) 1.3.2 918.710 ms 1160.914 ms
Chrome (m1 mac) 1.3.2 (patched) 569.240 ms 829.614 ms

I see that you benchmark on 500 frames. We test on a longer test sample now, encoding 10000 20ms frames,

I ran the benchmark with setting ITERATIONS to 10000. The performance degradation still existed as before (i.e., ITERATIONS=500) as shown in the following table.

Browser Lyra Version Encode Time Decode Time
Chrome (m1 mac) 1.3.1 11276.710 ms 16442.069 ms
Chrome (m1 mac) 1.3.2 18254.064 ms 23232.560 ms

We saw a speed increase when running on Android natively

Let me confirm that, is https://github.com/google/lyra/commit/47698dadf0010abff6a848e02642f55f806d4842#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5 the benchmark result you mentioned in the above comment? If so, it seems the perfomance on Android has dropped slightly on v1.3.2. Seeing the following diff, that is quoted from the full diff between v1.3.1 and v1.3.2, it says "to decode one frame, v1.3.1 takes 0.473 ms and v1.3.2 takes 0.525 ms" (I might misunderstand something though).

  This shows that decoding a 50Hz frame (each frame is 20 milliseconds) takes
- 0.473 milliseconds on average. So decoding is performed at around 42 (20/0.473)
+ 0.525 milliseconds on average. So decoding is performed at around 38 (20/0.525)
  times faster than realtime.
mchinen commented 1 year ago

Thanks @sile! I'm leaning to reverting the flag change while we continue to look into it. It seems the effect of the flag depends largely on which version of TF we are using, and it is different for each platform.

Regarding the benchmark, I think it's a red herring. The new version's benchmark on native Android actually doesn't reflect a drop in speed due to that flag on Android. Rather, the reason the new benchmark is slower is because we did the earlier benchmarking on our internal version, which uses a different toolchain and newer version of TF, which is not appropriate for our open source users (since the internal version happens to be slightly faster). Hope that clears that up!

sile commented 1 year ago

Make sense, thanks! I'm looking forward to seeing a newer version that fixes the performance issue on WebAssembly.

y-71 commented 1 year ago

how is it compared to OPUS though?