Improve caption extraction to handle corrections

I bought the closed captioning handbook and read through it. There is a section about control characters which exist in the CC spec. These allow a broadcaster to move the cursor back in a given rollup line, or wipe out all characters after a certain point in the buffer.

CCExtractor knows how to process these characters, but so far I don't see any indication that it will emit those character. Really the issue is that CCExtractor doesn't directly handle the "stream of individual characters" use case. It does the magic behind the scenes, and what TV Kitchen has done is take all of that processing and then go back to simulate a rollup.

This is all a long way of saying that I believe what we need to do is ditch the simulated rollup, and instead just have the caption extractor emit payloads one line at a time. I think it can still break the lines into ATOM payloads, which will have more backwards compatibility down the line if we decide to ditch CCExtractor and parse caption streams directly.

By doing this I believe we will fix this bug, and possibly #139 as well

tvkitchen / appliances

Improve caption extraction to handle corrections #133

Task

Description

Relevant Resources / Research