savbell / whisper-writer

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
GNU General Public License v3.0
244 stars 40 forks source link

Issue with Inconsistent Transcription Quality in Whisper Writer #46

Open Jaykurb opened 1 month ago

Jaykurb commented 1 month ago

I have recently set up Whisper Writer, and it appears very promising. However, I am facing an issue with inconsistent transcription quality.

In some recordings, Whisper Writer transcribes everything I say perfectly. However, in other instances, it produces nonsensical output. For example:

Transcription:  Mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio.
Post-processed transcription: Mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio, mae'n gweithio.

In the above example, I was speaking in normal, clear sentences and was not repeating the same words.

savbell commented 1 month ago

Hi, thank you for reporting your issue!

Would you mind sharing more information on what model and settings you're using? For example, are you using the API or a local model? The base model or another? etc.

Unfortunately, hallucinations are one of the consequences of AI that are perhaps unavoidable, but we can try and minimize them by adjusting the settings. For example, the large-v3 model hallucinates more than the large-v2 model (here's an article on it), so I'd suggest trying out different models if you're running it locally. Try also setting the "condition on previous text" setting to False and see if that helps.

Thanks, Sav