pbs / pycaption

Python module to read/write popular video caption formats
Apache License 2.0
256 stars 136 forks source link

DFXP times can be expressed is offset format #59

Closed rasod closed 8 years ago

rasod commented 9 years ago

The spec for DFXP allows times to be expressed in offset time in addition to clock time. The DFXP convertor errors on input files that use offset time.

Example body fragment from Spec http://www.w3.org/TR/ttml1/#ttml-example-body

<p xml:id="subtitle3" begin="10.0s" end="16.0s" style="s2">

zmwangx commented 8 years ago

Yes, that's a big problem. The full syntax representation of time value expressions are here: http://www.w3.org/TR/ttml1/#timing-value-timeExpression. There's also an official test suite http://www.w3.org/2008/10/dfxp-testsuite.zip, upon which pycaption's DFXPReader (due to this and possibly other problems) fails miserably.

zmwangx commented 8 years ago

Here's a quick and dirty patch for partial offset time support (with minimal error checking):

diff --git a/pycaption/dfxp/base.py b/pycaption/dfxp/base.py
index f981cea..b6efff9 100644
--- a/pycaption/dfxp/base.py
+++ b/pycaption/dfxp/base.py
@@ -124,6 +124,19 @@ class DFXPReader(BaseReader):
         return start, end

     def _translate_time(self, stamp):
+        # offset time
+        if stamp.endswith('ms'):
+            return int(float(stamp[:-2]) * 1000)
+        elif stamp.endswith('s'):
+            return int(float(stamp[:-1]) * 1000000)
+        elif stamp.endswith('m'):
+            return int(float(stamp[:-1]) * 60000000)
+        elif stamp.endswith('h'):
+            return int(float(stamp[:-1]) * 3600000000)
+        elif stamp.endswith(('f', 't')):
+            raise NotImplementedError("Frames or ticks in time specifictions not supported")
+
+        # clock time
         timesplit = stamp.split(u':')
         if u'.' not in timesplit[2]:
             timesplit[2] = timesplit[2] + u'.000'

Or gist: https://gist.github.com/anonymous/857f31269d8477fe3b91.