Open mirix opened 1 year ago
Hello,
I have noticed that WhisperX outputs no timestamps for certain non-dictionary works, including numbers, email addresses and others. For instance:
{'word': 'I', 'start': 113.492, 'end': 113.572, 'score': 0.854} {'word': 'did,', 'start': 113.612, 'end': 113.832, 'score': 0.973} {'word': 'he', 'start': 113.892, 'end': 113.952, 'score': 0.786} {'word': 'is', 'start': 114.012, 'end': 114.092, 'score': 0.712} {'word': 'in', 'start': 114.152, 'end': 114.493, 'score': 0.572} {'word': '2016,'}
This does not happen in Whisper or FasterWhisper.
In fact, in some cases the tokens are completely absent. It is not just the timestamps.
https://github.com/m-bain/whisperX#limitations-%EF%B8%8F
Hello,
I have noticed that WhisperX outputs no timestamps for certain non-dictionary works, including numbers, email addresses and others. For instance:
{'word': 'I', 'start': 113.492, 'end': 113.572, 'score': 0.854} {'word': 'did,', 'start': 113.612, 'end': 113.832, 'score': 0.973} {'word': 'he', 'start': 113.892, 'end': 113.952, 'score': 0.786} {'word': 'is', 'start': 114.012, 'end': 114.092, 'score': 0.712} {'word': 'in', 'start': 114.152, 'end': 114.493, 'score': 0.572} {'word': '2016,'}
This does not happen in Whisper or FasterWhisper.