Open fire17 opened 5 months ago
This is a well known issue related to Whisper that has been trained on Youtube videos. Its not related to this project. When there's no sound detected, Whisper tend to "hallucinate" words that are said often in those video transcription it was trained on.
You can use a VAD (Voice Activity Detection) tool in your code to help mitigate this issue, but it will still be there.
Any good and light VAD you can recommend? Thanks
Hi there! First of all thanks a lot for this repo it works wonderfully, and even found realtime mic example from another issue's comments
the thing is... that sometimes recording silence produces hallucinations the most repetitive one is "Thanks for watching!" This is a common whisper issue
From reading online (for example) https://community.openai.com/t/how-to-avoid-hallucinations-in-whisper-transcriptions/125300?page=2
I tried searching for this as a param in lightning-whisper-mlx but couldnt find it, You do have this appearing in the code but not used..
was hoping you could properly expose this as a param, so we might be able to avoid these hallucinations
Thanks a lot and all the best!