Open yejinc00 opened 11 months ago
From the Readme:
Transcript words which do not contain characters in the alignment models dictionary e.g. "2014." or "£13.60" cannot be aligned and therefore are not given a timing.
It is a known limitation of whisperX
Couln't there be a way to kinda fake the alignment timings ? Maybe by taking the end of the previous word and the start of the next one as respectively the start and end of the number ?
While creating a transcript, I came across some segments that lacked start and end times, and I noticed all that this issue occurs when there are numbers in the speech, as shown in the example. Is this a bug?
{ "word": "with", "start": 41.119, "end": 41.26, "score": 0.862, "speaker": "1" }, { "word": "2266" }, { "word": "okay,", "start": 42.76, "end": 44.781, "score": 0.869 },