rbouqueau / TTML_in_MP4_DASH_statement

TTML in MP4 and MPEG-DASH guidelines
Other
12 stars 1 forks source link

Root temporal extent issues #11

Open fcartegnie opened 7 years ago

fcartegnie commented 7 years ago

Seems to me by looking at the overlapping examples (ex 2) there's some issue with timebases.

As time base is set to media, RootTemporalExtent is defined by the Instance of the document/body, which is here the sample presentation time. Timings can't then be identical for a same sub over different document instances sent at different times.

That's what I also find in unified-streaming's DASH ttml samples: http://demo.unified-streaming.com/video/elephantsdream/elephantsdream.ism/QualityLevels(1000)/Fragments(textstream_deu=4200000000)

nigelmegitt commented 7 years ago

I disagree - in my view the unified-streaming's DASH ttml samples are incorrect. The spec is that the media timebase is equivalent to the track time, rather than having a media time of zero be equivalent to the sample presentation time.

fcartegnie commented 7 years ago

6.2.11 ttp:timeBase "may be the content of a Document Instance itself in a case where the timed text content is intended to establish an independent time line."

If a full document with multiple timings exists in a sample, that can match the "independent time line".

Root Temporal Extent The temporal extent (interval) defined by the temporal beginning and ending of a Document Instance in relationship with some external application or presentation context.

Resolving paragraph/span timings can then be relative to document instance body timings, document instance as sample pts, or media time itself.

L Streaming TTML Content (Non-Normative) is the only part in favor of just "slicing" the document, but is not normative...

No surprise that everyone does it his own way. Decoders issues ahead.

nigelmegitt commented 7 years ago

Those are TTML1 references - you also need to check ISO 14496-30 for the definitive interpretation of TTML timings in the context of ISO BMFF/MP4.

The goal of this (TTML in MP4 DASH) document is exactly to clarify this kind of issue to minimise decoder issues and misunderstandings.

rbouqueau commented 7 years ago

Hi François, happy new year!

I disagree - in my view the unified-streaming's DASH ttml samples are incorrect.

I agree. I noticed that also and that's probably because USP internal format seems to be based on smooth streaming (where the sample TTML timings are relative to the beginning of the segment).

That being said, dash.js sometimes requires to have the absolute time relative to the DASH MPD availability start time (instead of using the media timeline). I haven't found the pattern yet so I didn't report.

@fcartegnie The exact reason I wanted to write this document was to have a single interpretation So your input and discussions are warmly welcome :)

Also note that MPEG-4 part 30 has precedence over TTML for the timeline. And MPEG-4 part 30 states (section 5.3):

The top-level internal timing values in the timed text samples based on TTML express times on the track presentation timeline – that is, the track media time as optionally modified by the edit list. For example, the begin and end attributes of the element, if used are relative to the start of the track, not relative to the start of the sample.

fcartegnie commented 7 years ago

Defining time reference by the use of the container instead of tagging the document itself seem pretty akward :/

The major issue I see with absolute timebase reference for TTML samples, in a context where content can be reused/remuxed and/or passed to container agnostic decoder, is that you cannot edit data without editing samples. The USP smooth->dash case being the best example.

nigelmegitt commented 7 years ago

Actually both the document and the container impact how the time references are understood, and I don't think that's avoidable.

There are interesting issues however the timebase reference is arranged, however at this stage there is a specification and it is clear, so the most helpful thing for interoperability would be if everyone used it.

Bear in mind that unlike the case of audio or video where there is a linear series of contiguous samples of known size, that can be concatenated or split, text based subtitle and caption formats include time expressions within them. Any kind of transformation, resampling, reuse etc that affects the timing will require the time expressions in the document to be modified, regardless of the basis of the time expressions (e.g. consider resampling to join multiple TTML samples together or split them apart). I'm fairly confident that all object based encoding schemes have this "feature".

nigelmegitt commented 7 years ago

Slight correction to the above: resampling to join or split TTML samples together currently does not require time expressions to be modified since they are constant relative to the track; other transformations may do, and if the time expressions were relative to the sample then those modifications certainly would require time expression reprocessing.