Open slifty opened 3 years ago
We should also not emit an SRT if it holds no content.
The whitespace issue could also be handled by the caption extractor, since that is where it is introduced.
I believe that whitespace may be a byproduct of position estimation / screen rendering. The caption extractor really has been framed as a chance to extract ascii / transcripts from a stream.
The reason to do the fix in SRT is that the SRT absolutely doesn't want it, but it's possible that there would be a use case for downstream caption extraction appliance data that wants the raw caption data as it was originally encoded.
Task
Description
SRTs sometimes have lots of whitespace (which is part of the caption data) but which we can clear up in our SRT payloads.
I think it would be reasonable to (A) trim the white space at the front and end and (B) convert
\s*
to just a single space.Note this is not talking about changing the captions -- this is just for the SRT appliance.