Open Atefeh197 opened 1 year ago
yes, this is a limitation of whisperx, it is unable to provide word timestamps for numerals. You can avoid this by transcribing with --supress_numerals
flag, this will transcribe numbers literally e.g. "780" -> "seven hundred and eighty". You could then use a text normalizer to convert this back to text
Regarding the normalization afterwards, there are libraries like text2num
but they don't support many languages. Maybe yours is supported.
@robvanson I think the issue is related to #717
@m-bain How can I pass supress_numerals=True
when in python interface?
I found the method on the issue #629 and code snippet below. https://github.com/m-bain/whisperX/blob/78dcfaab51005aa703ee21375f81ed31bc248560/whisperx/asr.py#L259-L332
Thanks!
@snoop2head "--suppress_numerals" works for me, thanks a lot.
Would there be a possibility to implement something that transforms numerics to text where it is needed and reverts back to numeric when the timestamp is set? This way we would get the original numeric value with timestamps in place.
Hi everyone
I just use whisperX, it has more accurate timestamps than whisper, but it is very inaccurate in the case of numbers.
For example, the text for the first row is "Good day." and it extracts accurate timestamps for each word. But the second row is "780 802", you see the start and end times are very close to each other furthermore we do not have time for "780" and "802" separately.
{'start': 2.861, 'end': 4.081, 'text': 'Good day.', 'words': [{'word': 'Good', 'start': 2.861, 'end': 3.281, 'score': 0.587}, {'word': 'day.', 'start': 3.301, 'end': 4.001, 'score': 0.48}]},
{'start': 20.025, 'end': 20.045, 'text': '780 802', 'words': [{'word': '780'}, {'word': '802'}]},
How I can get the better timestamps for numbers?