Hallucination causes failure to align - uncleaned input in whisper dataset

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD 2-Clause "Simplified" License

11.39k stars 1.2k forks source link

Hey all!

I'm transcribing a 90min long file in German, and whisper hallucinates the following patterns: "Untertitel der Amara.org-Community" "Untertitel im Auftrag des ZDF für funk, 2017"

Which causes the following error: Failed to align segment (" Untertitel der Amara.org-Community"): backtrack failed, resorting to original...

Here's a Github issue on whisper that identifies this + more patterns

Any idea of how to fix this? Tried a couple initial prompts, researched token suppression but found no fix so far.

Would be awesome to make the alignment more robust to just skip segments it cannot align.

Thanks for putting this awesome library together 🙏

Best,

Nick

m-bain / whisperX

Hallucination causes failure to align - uncleaned input in whisper dataset #230