m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.66k stars 1.34k forks source link

Default config of `without_timestamps=True` affects whisper transcript quality. #932

Open Artaches opened 3 days ago

Artaches commented 3 days ago

WhisperX default is without_timestamps=True; while faster-whisper's default is without_timestamps=False. This affects transcript quality, whisperX output can have long (5-15s) continuous transcript drops. Attached an example of a small audio clip (~10s, so VADs in WhisperX and Faster-Whisper are off) that has worse transcript output when without_timestamps=True.
whisperxWithoutTimestepsExample.zip