mustafaaljadery / lightning-whisper-mlx

An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
https://mustafaaljadery.github.io/lightning-whisper-mlx/
588 stars 30 forks source link

(Hallucinations) "Thanks for watching!" #15

Open fire17 opened 5 months ago

fire17 commented 5 months ago

Hi there! First of all thanks a lot for this repo it works wonderfully, and even found realtime mic example from another issue's comments

the thing is... that sometimes recording silence produces hallucinations the most repetitive one is "Thanks for watching!" This is a common whisper issue

From reading online (for example) https://community.openai.com/t/how-to-avoid-hallucinations-in-whisper-transcriptions/125300?page=2

I tried searching for this as a param in lightning-whisper-mlx but couldnt find it, You do have this appearing in the code but not used..

was hoping you could properly expose this as a param, so we might be able to avoid these hallucinations

LightningWhisperMLX(model="medium", batch_size=240, quant=None, temperature=0)

Thanks a lot and all the best!

SneakerFreaker64 commented 3 months ago

This is a well known issue related to Whisper that has been trained on Youtube videos. Its not related to this project. When there's no sound detected, Whisper tend to "hallucinate" words that are said often in those video transcription it was trained on.

You can use a VAD (Voice Activity Detection) tool in your code to help mitigate this issue, but it will still be there.

fire17 commented 3 months ago

Any good and light VAD you can recommend? Thanks