tvkitchen / appliances

A one stop shop for official TV Kitchen Appliances
GNU Lesser General Public License v3.0
3 stars 0 forks source link

Caption Extractor releasing negative durations #109

Closed slifty closed 3 years ago

slifty commented 3 years ago

Bug

Current Behavior

In debugging #107 I started looking at the stream of data coming out of Caption Extractor.

Emissions are in the right order, but inspecting the payloads I'm seeing some odd positions / durations at the point of the SRT bug:

AvroPayload {
  data: '\n',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T01:42:55.674Z',
  origin: '2021-04-24T01:42:41.457Z',
  duration: 0,
  position: 12445
}
AvroPayload {
  data: '>>',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T01:42:55.674Z',
  origin: '2021-04-24T01:42:41.457Z',
  duration: -12445,
  position: 12445
}
AvroPayload {
  data: ' I',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T01:42:55.832Z',
  origin: '2021-04-24T01:42:41.457Z',
  duration: 12545,
  position: 0
}
AvroPayload {
  data: ' M',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T01:42:55.832Z',
  origin: '2021-04-24T01:42:41.457Z',
  duration: 34,
  position: 12545
}
AvroPayload {
  data: 'EA',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T01:42:55.832Z',
  origin: '2021-04-24T01:42:41.457Z',
  duration: 33,
  position: 12579
}

Notice the negative duration and the zero duration around >> and I.

PayloadArray guarantees packet order, so this would be the reason why those payloads are swapped in the SRT generator.

slifty commented 3 years ago

Looking at convertCcExtractorLineToPayloads at the point of an offending line break (note they sometimes work fine!)

----------------
line: CCExtractorLine { start: 43276, end: 43292, text: 'AB' }
previousLine: CCExtractorLine { start: 39005, end: 43209, text: '>> OF COURSE' }
----------------

And the relevant payloads:

AvroPayload {AvroPayload {
  data: ' O',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T02:00:26.741Z',
  origin: '2021-04-24T01:59:46.222Z',
  duration: 39039,
  position: 0
}
AvroPayload {
  data: 'F',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T02:00:26.741Z',
  origin: '2021-04-24T01:59:46.222Z',
  duration: 33,
  position: 39039
}
AvroPayload {
  data: ' CO',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T02:00:26.741Z',
  origin: '2021-04-24T01:59:46.222Z',
  duration: 33,
  position: 39072
}
AvroPayload {
  data: 'UR',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T02:00:26.881Z',
  origin: '2021-04-24T01:59:46.222Z',
  duration: 34,
  position: 39105
}
AvroPayload {
  data: 'SE',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T02:00:26.881Z',
  origin: '2021-04-24T01:59:46.222Z',
  duration: 33,
  position: 39139
}
AvroPayload {
  data: '\n',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T02:00:31.042Z',
  origin: '2021-04-24T01:59:46.222Z',
  duration: 0,
  position: 43209
}
AvroPayload {
  data: 'AB',
  type: 'TEXT.ATOM',
  createdAt: '2021-04-24T02:00:31.042Z',
  origin: '2021-04-24T01:59:46.222Z',
  duration: 83,
  position: 43209
}
slifty commented 3 years ago

It looks like the CCExtractor output itself is weird:

00:02:02,972|00:02:03,706|ROCKEFELLER CENTER IN THE HEART
00:00:00,000|00:00:00,000|OF

Going to see what's up with CCExtractor but also on the appliance level one solution here might be to never allow the start position to move backwards when parsing CCExtractor lines

slifty commented 3 years ago

I've filed an issue in the CCExtractor repo: https://github.com/CCExtractor/ccextractor/issues/1327

It's also always possible this is is user error / I have a flag set incorrectly.