pbs / pycaption

Python module to read/write popular video caption formats
Apache License 2.0
255 stars 137 forks source link

SCC timing query #302

Closed ksn-systems closed 9 months ago

ksn-systems commented 1 year ago

Hi

I have been trying to convert from an SCC file to a vtt file. The file is 2 hours long and at the end, there seems to be a drift in the caption timing of around 6-8 seconds.

As a test, I am just running this:

from pycaption import ( SCCReader, SCCWriter, SRTReader, SRTWriter, DFXPWriter, WebVTTWriter,detect_format )

filename = "/mnt/storage/captions/OT197518.scc" str = open(filename, 'r').read()

print(WebVTTWriter().write(SCCReader().read(str)))

The last 2 captions in the SCC look like this

image

and the output looks like this:

image

I have seen the exact same drift on a commercial product (Telestream) with the same 11 second stretch but loading the file into subedit yields what I would expect to see as a result.

I am not a fan of scc but that is the hand I have been dealt!

Everything else on the VTT writer looks great.

Thank you for a great package.

Darren Breeze

ianShifrin commented 9 months ago

Hi Darren,

I believe the timing in PyCaption (and that in Telestream) is correct (the buffering time of a frame per a character). A drift is usually caused by having a different frame rate (not 29.7) or presenting as non drop frame instead of drop frame (your captions time code is represented as non-drop frame, but if it is from a broadcast format it is probably drop frame).

This article, https://support.telestream.net/s/article/SCC-time-codes-compared-to-caption-time-codes may be helpful.

Take care, Ian

ksn-systems commented 9 months ago

Thanks Ian