Closed Jeronymous closed 12 months ago
Hi, thanks for feedback. Yes, I know that VAD is in whisper_timestamped. I put NotImplemented because I primarily use and focus on faster-whisper backend. Feel free to implement it -- it should be easy, passing parameter to a function, analogically to https://github.com/ufal/whisper_streaming/blob/23c2d568d8262a910a83b01025faa12244255756/whisper_online.py#L136
SILERO vs AUDITOK is a topic for another issue. I don't have feedback.
but I realized that VAD is now used ineffectively. In every update it's processed on the whole buffer. It could be used to cut silence out of the buffer, so that next update is faster. This could be improved
SILERO vs AUDITOK is a topic for another issue. I don't have feedback.
@Jeronymous , please open an issue about this, if you'll have a test results to share
First, thank you. I am super happy to see whisper-timestamped used in such a good project. Having Whipser streamed in real time is a super feature!
I see here that VAD is not available when using whisper-timestamped backend: https://github.com/ufal/whisper_streaming/blob/23c2d568d8262a910a83b01025faa12244255756/whisper_online.py#L79-L80
But VAD IS implemented in whisper-timestamped (it was even before faster-whisper integrated it). It's currently based on SILERO (same as what was done in faster-whisper). Am I missing a sticking point? (Maybe the fact that things required for VAD are not by default in the requirements?) I can contribute if help is needed on this.
(VAD is important to prevent some hallucinations of Whisper models, and make timestamps more accurate)
Also, I want to mention: After being disappointed with weird results on some files, I opened a branch to replace SILERO with AUDITOK : https://github.com/linto-ai/whisper-timestamped/pull/78 (see the linked issue to have an illustration of possible "hallucinations" of Silero). I had good experience with Auditok. I was hoping some user feedback to confirm before merging in master. But as it's not coming, maybe we just need to establish a benchmark to confirm the improvement.