modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.99k stars 744 forks source link

optimize ComputeDecibel in fsmn-vad model by using numpy #2174

Closed hongfanmeng closed 3 weeks ago

hongfanmeng commented 3 weeks ago

The original ComputeDecibel used a Python for loop, which was slower in speed. Optimizing it with numpy calculations can speed up the process.

Experiments on Xeon Gold 5118 showed that for a 28-minute audio, the VAD time could be reduced from 38 seconds to 26 seconds.

cProfile Result (ComputeDecibel func from 15.5s to 0.39s) :

Before:

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 38.459 38.459 model.py:648(inference)
29 0.002 0.000 35.193 1.214 model.py:548(forward)
28 0.740 0.026 17.866 0.638 model.py:755(DetectCommonFrames)
29 10.151 0.350 15.522 0.535 model.py:325(ComputeDecibel)
170405 4.810 0.000 13.324 0.000 model.py:500(GetFrameState)
170405 0.884 0.000 7.574 0.000 model.py:782(DetectOneFrame)
67/54 4.156 0.062 7.370 0.136 socket.py:621(send)
141982 0.178 0.000 5.533 0.000 model.py:446(OnVoiceDetected)
142496 3.472 0.000 5.391 0.000 model.py:373(PopDataToOutputBuf)

After:

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 26.103 26.103 model.py:648(inference)
29 0.002 0.000 22.248 0.767 model.py:548(forward)
28 0.846 0.030 20.363 0.727 model.py:755(DetectCommonFrames)
170405 4.811 0.000 13.298 0.000 model.py:500(GetFrameState)
170405 0.885 0.000 7.501 0.000 model.py:782(DetectOneFrame)
67/60 4.031 0.060 7.181 0.120 socket.py:621(send)
141982 0.177 0.000 5.471 0.000 model.py:446(OnVoiceDetected)
142496 3.455 0.000 5.329 0.000 model.py:373(PopDataToOutputBuf)
... ... ... ... ... ...
29 0.338 0.012 0.397 0.014 model.py:325(ComputeDecibel)