m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.52k stars 1.32k forks source link

No timestamps for numbers, email addresses, etc #443

Open mirix opened 1 year ago

mirix commented 1 year ago

Hello,

I have noticed that WhisperX outputs no timestamps for certain non-dictionary works, including numbers, email addresses and others. For instance:

{'word': 'I', 'start': 113.492, 'end': 113.572, 'score': 0.854} {'word': 'did,', 'start': 113.612, 'end': 113.832, 'score': 0.973} {'word': 'he', 'start': 113.892, 'end': 113.952, 'score': 0.786} {'word': 'is', 'start': 114.012, 'end': 114.092, 'score': 0.712} {'word': 'in', 'start': 114.152, 'end': 114.493, 'score': 0.572} {'word': '2016,'}

This does not happen in Whisper or FasterWhisper.

mirix commented 1 year ago

In fact, in some cases the tokens are completely absent. It is not just the timestamps.

awerks commented 1 year ago

https://github.com/m-bain/whisperX#limitations-%EF%B8%8F