k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.68k stars 427 forks source link

Problem with Flush function in voice-activity-detector.cc #1314

Closed laochen closed 2 months ago

laochen commented 2 months ago

It is best to keep all the original data in the flush action to facilitate splicing with the remaining buffer for the final recognition. The current code will cause the accuracy of the last sentence recognition. My modification suggestions voice-activity-detector.cc line:126 //int32t end = buffer.Tail() - model_->MinSilenceDurationSamples(); int32t end = buffer.Tail();

yuyun2000 commented 2 months ago

Happy man has something going on recently, so he might reply to you late.

csukuangfj commented 2 months ago

It is best to keep all the original data in the flush action to facilitate splicing with the remaining buffer for the final recognition. The current code will cause the accuracy of the last sentence recognition. My modification suggestions voice-activity-detector.cc line:126 //int32t end = buffer.Tail() - model_->MinSilenceDurationSamples(); int32t end = buffer.Tail();

Yes, I agree with you.

Would you mind creating a pull request to fix it?