(Hallucinations) "Thanks for watching!"

fire17 commented 5 months ago

Hi there! First of all thanks a lot for this repo it works wonderfully, and even found realtime mic example from another issue's comments

the thing is... that sometimes recording silence produces hallucinations the most repetitive one is "Thanks for watching!" This is a common whisper issue

From reading online (for example) https://community.openai.com/t/how-to-avoid-hallucinations-in-whisper-transcriptions/125300?page=2

It seems that changeing the temperature to 0 could help, as it adjusts it automatically

I tried searching for this as a param in lightning-whisper-mlx but couldnt find it, You do have this appearing in the code but not used..

was hoping you could properly expose this as a param, so we might be able to avoid these hallucinations

LightningWhisperMLX(model="medium", batch_size=240, quant=None, temperature=0)

Thanks a lot and all the best!

SneakerFreaker64 commented 3 months ago

This is a well known issue related to Whisper that has been trained on Youtube videos. Its not related to this project. When there's no sound detected, Whisper tend to "hallucinate" words that are said often in those video transcription it was trained on.

You can use a VAD (Voice Activity Detection) tool in your code to help mitigate this issue, but it will still be there.

fire17 commented 3 months ago

Any good and light VAD you can recommend? Thanks

mustafaaljadery / lightning-whisper-mlx

(Hallucinations) "Thanks for watching!" #15