Open sile opened 1 year ago
Hi @sile, thank you for the benchmarking (and the nice benchmarking tool) on WASM. Indeed, this is unexpected.
There are some possible causes of the drop in performance on WASM:
VERBOSE: Replacing 126 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 33 partitions.
with the TF 2.11 upgrade, which was more than TF 2.9 (94 nodes/15 partitions). It would be good to check the logs to see if WASM sees similar numbers of nodes/partitions to make sure the accelerated XNNPACK path is being used.We will continue to look into this, but we first need to get set up to benchmark WASM ourselves. Feel free to play with the above settings in the meantime. Let me know if you have any questions or want to chat about this.
Thank you for your reply! I am off for a while. So I will look at the detail when I am back to work.
Hi @mchinen, thank you again for the detailed advice. I tried some of them, so let me share the result.
There is a possibility that the TFLITE_XNNPACK_DELEGATE_FLAG_QU8 in tflite_model_wrapper.cc flag caused an issue.
This diff seems having huge impact on the performance degradation. I tried reverting this change. Then the encode / decode time of the patched v1.3.2 became comparable to v1.3.1 (see the table below).
Browser | Lyra Version | Encode Time | Decode Time |
---|---|---|---|
Chrome (m1 mac) | 1.3.1 | 570.260 ms | 821.094 ms |
Chrome (m1 mac) | 1.3.2 | 918.710 ms | 1160.914 ms |
Chrome (m1 mac) | 1.3.2 (patched) | 569.240 ms | 829.614 ms |
I see that you benchmark on 500 frames. We test on a longer test sample now, encoding 10000 20ms frames,
I ran the benchmark with setting ITERATIONS
to 10000. The performance degradation still existed as before (i.e., ITERATIONS=500
) as shown in the following table.
Browser | Lyra Version | Encode Time | Decode Time |
---|---|---|---|
Chrome (m1 mac) | 1.3.1 | 11276.710 ms | 16442.069 ms |
Chrome (m1 mac) | 1.3.2 | 18254.064 ms | 23232.560 ms |
We saw a speed increase when running on Android natively
Let me confirm that, is https://github.com/google/lyra/commit/47698dadf0010abff6a848e02642f55f806d4842#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5 the benchmark result you mentioned in the above comment? If so, it seems the perfomance on Android has dropped slightly on v1.3.2. Seeing the following diff, that is quoted from the full diff between v1.3.1 and v1.3.2, it says "to decode one frame, v1.3.1 takes 0.473 ms and v1.3.2 takes 0.525 ms" (I might misunderstand something though).
This shows that decoding a 50Hz frame (each frame is 20 milliseconds) takes
- 0.473 milliseconds on average. So decoding is performed at around 42 (20/0.473)
+ 0.525 milliseconds on average. So decoding is performed at around 38 (20/0.525)
times faster than realtime.
Thanks @sile! I'm leaning to reverting the flag change while we continue to look into it. It seems the effect of the flag depends largely on which version of TF we are using, and it is different for each platform.
Regarding the benchmark, I think it's a red herring. The new version's benchmark on native Android actually doesn't reflect a drop in speed due to that flag on Android. Rather, the reason the new benchmark is slower is because we did the earlier benchmarking on our internal version, which uses a different toolchain and newer version of TF, which is not appropriate for our open source users (since the internal version happens to be slightly faster). Hope that clears that up!
Make sense, thanks! I'm looking forward to seeing a newer version that fixes the performance issue on WebAssembly.
how is it compared to OPUS though?
[NOTE] This is just an FYI issue as I know this project doesn't officially support WebAssembly.
As I mentioned in https://github.com/google/lyra/issues/49, shiguredo/lyra-wasm maintains a no-patch WebAssembly build of Lyra. Today, I updated the Lyra version to 1.3.2 (https://github.com/shiguredo/lyra-wasm/pull/10). However, it turned out that the encoding and decoding peformance is degraded after the update.
The following table is a benchmark result from https://shiguredo.github.io/lyra-wasm/lyra-benchmark.html. (elapsed times taken to encode / decode 10 seconds audio data)
I don't know the reason of this performance drop. Any information that helps alleviate this problem is more than welcomed.