Missing Timestamps for Number Speech

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD 2-Clause "Simplified" License

12k stars 1.26k forks source link

Missing Timestamps for Number Speech #559

Open yejinc00 opened 11 months ago

yejinc00 commented 11 months ago

While creating a transcript, I came across some segments that lacked start and end times, and I noticed all that this issue occurs when there are numbers in the speech, as shown in the example. Is this a bug?

{ "word": "with", "start": 41.119, "end": 41.26, "score": 0.862, "speaker": "1" }, { "word": "2266" }, { "word": "okay,", "start": 42.76, "end": 44.781, "score": 0.869 },

RaulKite commented 11 months ago

From the Readme:

Transcript words which do not contain characters in the alignment models dictionary e.g. "2014." or "£13.60" cannot be aligned and therefore are not given a timing.

It is a known limitation of whisperX

dguerizec commented 1 month ago

Couln't there be a way to kinda fake the alignment timings ? Maybe by taking the end of the previous word and the start of the next one as respectively the start and end of the number ?