Open NielsVandenEynde opened 1 year ago
I have the same issue for all numeric values in a transcript. This includes punctuation that is stored with the value. Examples: 1
, 2,
, 3?
, 4.
, and 2,000
.
This is covered in the readme under the Limitations heading, "Transcript words which do not contain characters in the alignment models dictionary e.g. "2014." or "£13.60" cannot be aligned and therefore are not given a timing."
Would it be possible to include an option for giving these words some sort of timestamp, even if approximate or "neighboring"? Like the OP, I am parsing individual words into sentences, and this is aided by standardization of the format.
I see, yes it's tricky because for some use cases non-aligned word timestamps can be interpolated from its neighbors. For other use cases, dropping them is more suitable - or merging into neighboring words. And I wanted to make some distinction between alignable words rather than make some heuristics for ones that aren't.
At the moment I left it up to the user to do the post-processing. Feel free to push some helper functions which do any of the above seems like there is a need for it
Are there any hope of getting a solution in place that can highlight numeric words (e.g. via tags in .srt files, as is done for non-numeric words)?
I have spent a large amount of time trying to write a Python program for post-processing of .srt files as aforementioned, but the complexities involved in sentence and word management is beyond my ability, and it would probably make more sense for everyone to have WhisperX provide the ability to approximate the closest timecode for numeric words?
This is something that changed with the latest version. Some short words such as numbers just don't have timestamps. This is annoying because I'm parsing those words into sentences myself.