w3c / dapt

Dubbing and Audio description Profiles of TTML2
https://w3c.github.io/dapt/
Other
5 stars 3 forks source link

Required metadata field for earliest SMPTE time code to allow conversion between DAPT and ESEF #232

Open ewanrsmith opened 3 months ago

ewanrsmith commented 3 months ago

Descriptions in ESEF files are timed via SMPTE timecodes so to convert to and from DAPT a key time code must be included in the metadata from which the timecodes of all other descriptions can be extrapolated. An existing EBUTT metadata field - documentStartOfProgramme - would be an obvious candidate for this but this value isn't typically recorded in an ESEF file.

@nigelmegitt proposes a new DAPT metadata value - equivalent to the ESEF format's first_content_tc - that all other relative values can be synchronised against.

nigelmegitt commented 3 months ago

As discussed with @ewanrsmith here's a diagram representing the different timelines, showing where some alignment point is likely needed:

image

For audio description workflows that need to generate a WAV file to be played out co-synchronously with the programme media, some process, often known as "compilation" is needed to take the source DAPT and any referenced audio recordings and generate the audio output file. In some workflows a "handle" is prepended to the WAV file for use as a kind of pre-roll.

Two key pieces of data are needed for the compilation process, both of which are external to the DAPT document itself:

  1. The begin time, which may be the start of programme content or be the earlier by the duration of the handle
  2. The end time

Depending on how the DAPT was authored, the begin time, referenced to the DAPT document's media timeline, could be earlier than (less than) zero. In some cases the begin time is expressed in SMPTE timecode rather than media time.

In order to support compilation processes that express the begin time in SMPTE timecode, some alignment between the DAPT document's media timeline and the relevant SMPTE timeline is needed.

The proposal is to add optional document-level metadata that defines the SMPTE timecode that corresponds to media time zero in the DAPT document. For this metadata to be interpreted correctly, a frame rate, frame rate multiplier and dropMode are likely needed.

An alternative solution to this problem would be to permit time expressions to use SMPTE timecode: I'm reluctant to go down that route because:

  1. The output audio must be a single continuous and contiguous resource, so it is not useful to allow time expressions that support potential discontinuities.
  2. Implementation experience e.g. in the domain of subtitles and captions, has shown that conversion of SMPTE timecode using non-integer frame rates is a significant source of interoperability problems in implementations.

Note that as per the diagram, in some cases the ebuttm:documentStartOfProgramme metadata from the EBU's Tech3390 EBU-TT, Part M Metadata Definitions may be present, however it does not represent the same thing, and forcing it to do so would constitute a misuse.

css-meeting-bot commented 3 months ago

The Timed Text Working Group just discussed Required metadata field for earliest SMPTE time code to allow conversion between DAPT and ESEF w3c/dapt#232, and agreed to the following:

The full IRC log of that discussion <nigel> Subtopic: Required metadata field for earliest SMPTE time code to allow conversion between DAPT and ESEF w3c/dapt#232
<nigel> github: https://github.com/w3c/dapt/issues/232
<nigel> Ewan: The issue was around conversion between legacy formats and DAPT.
<nigel> .. DAPT is related to media time. The issue I had is, when converting from ESEF with explicit SMPTE timecodes,
<nigel> .. I needed to decode the media time in the DAPT relative to the SMPTE timecode.
<nigel> .. But there isn't a metadata field helping to do the mapping in DAPT or TTML2.
<nigel> .. For some authoring platforms you can't specify the timecode of the first frame of the programme,
<nigel> .. just the timecode of the first description.
<nigel> .. That's likely the best metadata value to reflect in DAPT to use for this conversion activity.
<atai> q+
<nigel> Nigel: I see this as a problem of relating the DAPT media time to the SMPTE timecode of the source media,
<nigel> .. which you need to do so that, during a thing called "compilation", which takes as input a programme timecode
<nigel> .. for the beginning, and generates a single WAV file the duration of the programme,
<nigel> .. that compilation process can know when, relative to the begin time passed in, the DAPT media time begins.
<nigel> ack atai
<nigel> Andreas: Thanks Ewan for bringing up this case.
<nigel> .. I understand this as being about the mapping between two different time coordinate systems.
<nigel> .. A source file and maybe a target file in the SMPTE timecode space but DAPT is media time,
<nigel> .. and to get proper conversion you want some metadata either telling what the source timecode is,
<nigel> .. or, for converting to a target file, that uses SMPTE timecode. Is that correct?
<nigel> Ewan: yes, it's trying to find an anchor that's common between both.
<nigel> Andreas: Yes, that could be a common use case, but I'm not sure if it falls in scope of DAPT, because
<nigel> .. it is information outside the DAPT document itself.
<nigel> .. It is tricky because the information is specific to the DAPT document.
<nigel> .. I wonder if it is useful to define some metadata in W3C or DAPT, or you could always use any kind
<nigel> .. of metadata to enable this use case.
<nigel> .. The overall question is: is this a workflow problem, or do we expect it to be a common use case
<nigel> .. that needs to be supported in different workflows?
<nigel> Ewan: Difficult to say, easy for me to speak to my own workflows.
<nigel> .. The most common workflow involving compilation requires the presence of the media.
<nigel> .. You wouldn't have single WAVs and an exchange document. That maybe makes it more unusual
<nigel> .. and less general for me. It's possible that others have the same difficulty with the same process.
<nigel> .. Or they could capture the information externally to the exchange document. It's a good question,
<nigel> .. I'm not sure I can answer it.
<nigel> .. It's also specific to this conversion use case, between ESEF and DAPT.
<nigel> .. Without this metadata I don't think DAPT can accommodate that use case.
<nigel> .. It feels like that is a use case we can consider - is DAPT a clean slate for new workflows?
<atai> q+
<nigel> Nigel: I'm sympathetic to this use case, partly because I think ESEF is one of the de facto "standards"
<nigel> .. used by a lot of people, and I think creating a pathway for migration to DAPT would be very helpful.
<nigel> ack at
<nigel> Andreas: It's sort of tunnelling some metadata through the DAPT document. I think that it's a use case
<nigel> .. that's not specific to DAPT, right. It could be for conversion against legacy STL files and TTML too.
<nigel> .. As Nigel said, we have defined this metadata field in the EBU, startOfProgramme, which is kind
<nigel> .. of the data we're looking for, but I'm not sure if you could use this and find a semantic for this use case.
<nigel> Nigel: I thought about this and decided it is close, but not the right thing.
<nigel> .. The trouble is that during the conversion process you may not know the start of programme timecode.
<nigel> .. All you definitely know is the timecode of the first description and the media time you're giving that.
<nigel> .. If you happen to know the start of programme timecode you could use it for this, but it would have to
<nigel> .. come from some other source.
<nigel> Ewan: The legacy format does have some theoretical provision for start of programme, but it wasn't ever implemented,
<nigel> .. which is frustrating.
<nigel> Andreas: In EBU STL we have two different metadata fields, one is startOfProgramme, and the other,
<nigel> .. that relates to the timecode of the first description, would be the first Timecode In cue (TCI).
<nigel> .. I think we have not defined that in EBU-TT, I'm not sure.
<nigel> .. Ewan, you're saying you want both, the start of programme timecode and the timecode of the first description,
<nigel> .. because it depends on the platform, which is available.
<nigel> Ewan: Yes I think so.
<nigel> Nigel: My experience of conversion workflows is that the more standalone they are the better,
<nigel> .. because if you have to start introducing other data from other sources that introduces the potential for errors.
<nigel> .. the other thing I thought about with this is if this use case is an argument for allowing SMPTE timecodes
<nigel> .. in DAPT, but I think it's better to have a cross-reference data point in the document that relates
<nigel> .. the media time to some other external SMPTE timecode than to suffer the implementation difficulties
<nigel> .. that others have had with SMPTE timecode in TTML, in terms of getting correct implementation of
<atai> q+
<nigel> .. dropMode, frameRate etc, where interoperability has been a problem.
<nigel> ack at
<nigel> Andreas: Maybe you can explain a bit what you mean about a cross reference data point?
<nigel> .. Still I'm wondering if I have understood the full impact - why should this be specific to DAPT and could
<nigel> .. not be something that also relates to other data in TTML?
<nigel> Nigel: What I mean is some metadata that relates a fixed point in the media timeline with a fixed point
<nigel> .. in the alternate SMPTE timecode timeline, e.g. media time zero = 10:00:00:00 SMPTE.
<nigel> .. Exactly one value would be present if the mapping is needed.
<nigel> .. Or no values if not.
<nigel> .. It may well be a problem for other uses of TTML, but it hasn't been as presented as one,
<nigel> .. with such a clear use case, so far.
<nigel> Andreas: OK, what you propose is to map a DAPT media time to a SMPTE timecode timeline?
<nigel> Nigel: Yes
<nigel> Andreas: That's quite a generic solution, definitely, but if we have two data points, like in STL,
<nigel> .. one for start of programme, the other the timecode of the first description or first frame of content,
<nigel> .. would that also work? What would be a blocker for that solution?
<nigel> Nigel: You could do that, I just think if everyone is using the same media time reference point, that's
<nigel> .. halved the likelihood of different implementations.
<nigel> Andreas: I see your point, maybe we need to see what others think.
<nigel> .. What Ewan described seems to be a well understood in the world where this solution is being used.
<nigel> .. It could be that people building these solutions have something already known and implemented.
<nigel> .. I agree your solution is the cleaner way, and from a structure point of view, maybe a better fit.
<nigel> .. I'm not sure if its easier.
<nigel> Nigel: Interesting question.
<nigel> .. Thinking about next steps, I wonder if it's worth opening a pull request for this.
<nigel> .. It could even be an "at risk" feature. I think the idea of this solution is that it allows flexibility but
<nigel> .. is pragmatic for anyone coming from usage of this legacy format.
<nigel> .. Andreas, were you hinting that the best place to define this metadata is not DAPT, but somewhere else?
<nigel> Andreas: Yes, I was wondering where it should go instead. I saw that this kind of metadata could apply
<nigel> .. to other use cases. Then you open it up for a different kind of content.
<nigel> Nigel: I'm taking this as implementation feedback, because it came about during Ewan's attempts to implement DAPT.
<nigel> .. That's very strong.
<nigel> .. I'd be happy to open a pull request to propose something, even if we say it's "at risk", and then
<nigel> .. get more review on that.
<nigel> .. Any other thoughts on this?
<nigel> SUMMARY: @nigelmegitt to open a pull request to propose a DAPT-specific solution
css-meeting-bot commented 3 months ago

The Timed Text Working Group just discussed Required metadata field for earliest SMPTE time code to allow conversion between DAPT and ESEF w3c/dapt#232, and agreed to the following:

The full IRC log of that discussion <nigel> Subtopic: Required metadata field for earliest SMPTE time code to allow conversion between DAPT and ESEF w3c/dapt#232
<nigel> github: https://github.com/w3c/dapt/issues/232
<cpn> Nigel: We discussed this last time, I had an action to create a PR, but not had time yet
<cpn> ... It's worth discussing again
<cpn> Ewan: A problem I found converting between ESEF and DAPT is with timeline references, you need at least one shared timecode value in the DAPT
<cpn> ... the time codes are all relative to the media in DAPT, so without the value it's impossible to accurately convert between both formats
<cpn> ... so looked for a value we could share, but didn't find one
<cpn> ... EBU-TT has a first frame in programme
<cpn> ... the ESEF format does have field, but it's not implemented in a common authoring platform, so files won't have it
<cpn> ... So add a new metadata field for the first frame of the content, which would be common to any exchange format
<cpn> ... Not clear if it should be in DAPT or drawn from another spec like TTML2
<cpn> Nigel: There's a compilation process that happens, where the input to it, in broadcast workflows, is expressed in SMPTE time code
<cpn> ... used for synchronisation in playout
<cpn> ... so although we don't have SMPTE timecode in DAPT, if you're generating a file with the AD content in it, you need to associate the timeline with the SMPTE timecode
<cpn> ... This common example, where you don't know all the info, there's one piece of data missing, this proposal is to add DAPT metadata to say where time 0 matches some SMPTE time code
<cpn> ... So rather than expressing all times in DAPT in SMPTE timecode, have one point in time as a cross-reference
<cpn> Mike: I wonder if using a timecode that has gaps is a harder problem to solve, and if it would be more productive to do in a DASH context, where it's broadly understood
<nigel> q?
<cpn> ... For timed text we don't permit there to be gaps, e.g., a track such as a wave file with DAPT in it, it's not OK to have it start/stop, put in a null segment
<cpn> Nigel: I'm not sure I understand how that would work. How would you generate DASH that knows this. This is before decoding and packaging
<cpn> ... Agree that the audio file has to be continuous
<cpn> ... It needs to have the same play rate in the media as the resulting compiled audio file so you can play them in sync
<cpn> ... DASH doesn't have SMPTE time code?
<cpn> Mike: No, time of day in UTC
<cpn> Nigel: If you had an external wrapper for DAPT, you could put additional info in it
<cpn> Cyril: It should be possible to do a lossless round trip, at least
<cpn> ... even if with external vocabulary
<cpn> Nigel: That's the key question, should it be external or natively supported in DAPT, as it'll be a common issue and we could solve in a common way
<cpn> Cyril: How much vocabulary would it pull it, can we add just that one attribute?
<cpn> Nigel: You can
<cpn> ... The value in the EBU-TT metadata spec isn't exactly what we need, there isn't one that relates exactly to this
<MattS> q+
<cpn> ... Document start of programme in #1 but that info isn't available in ESEF, so you can't map it, but also can't rely on the DAPT media timeline being the start of the programme
<atai> q+
<cpn> ... You don't know where on the programme content timeline where that is, as the start of programme timecode is missing
<cpn> ... so becomes a circular dependency
<cpn> Matt: Makes sense to me, there's an offset value in BWAV where you can calculate start of programme
<cpn> ... Hard to have a series of timed events, they always refer to another audio file or audio track in another media file, so borrowing document start of programme makes sense
<cpn> Nigel: That feels like an interesting misuse, as time of first description may be a minute into the programme
<cpn> ... So if you use document start of programme as start of first AD...
<cpn> Cyril: Introduce an empty DAPT event before the first, then use start of programme for that. If we were to use this as a hack, the first description in the DAPT document would have the correct semantics for start of document?
<cpn> Nigel: Don't think you would
<cpn> ... You don't know how the AD in the file relate to the start of the programme in the original media. We lose the relationship with the timeline, so need a way to recreate it
<cpn> ... My goal was to propose some data or metadata to say that time 0 in the media timeline corresponds to some SMPTE timecode, to rebuild the relationship between the timeline
<cpn> Matt: Works nicely with how our BWAVs work
<nigel> q?
<cpn> Nigel: Look at how those two concepts coincide
<cpn> Matt: We can use to synchronise without a sidecar XML file
<cpn> Pierre: I've seen people get in trouble doing that, the value is meaningless outside the context of the timed text file
<cpn> Matt: It's in the compiled WAV file, agree it goes wrong if you mix and match
<cpn> Pierre: Use a playlist, don't hard-code into individual components of a playlist, from my experience
<cpn> Andreas: How would this be resolved with playlists?
<cpn> Pierre: If you have two separate componetns in a media playback, they way to relate their relative offsets is through a third object like a playlist
<cpn> ... Alternative is to have multiplexes, to tightly bind the essence components
<cpn> ... But as soon as they're not tightly bound they get separated, reused, so binding by inserting info individually stops working IME
<cpn> Matt: Challenge here is they come from different suppliers and different processes
<cpn> ... Those suppliers needs some way to have the relationship between the timelines
<cpn> Pierre: The playlist would do that. Doesn't have to be an external file, could be an API
<cpn> ack M
<cpn> Nigel: Interesting point, unless they're tightly bound. The AD script and the original media are tightly bound, it's a 1:1 relationship
<cpn> ... The scenario is more specific, and reliably specific, than the general case where you see those problems
<nigel> ack at
<cpn> Andreas: I understand both positions. The metadata can be meaningless or out of control of how you exchange the AD. So it's at the risk of the user to interpret the metadata and restrict the workflow
<nigel> q+ to ask if the "compilation" timecode could be provided as an input into the conversion from ESEF to DAPT
<cpn> ... I commented on this last time, the timecode of the first content in the AD isn't new, it's in EBU STL or EBU subtitles, time of first cue
<cpn> ... If makes sense to add metadata to refer to the zero timecode, could also be used for other things, and DAPT could be used for other TTML profiles
<cpn> ... If we use this kind of metadata, good to define in a way that refers not only to DAPT
<nigel> q?
<nigel> ack n
<Zakim> nigel, you wanted to ask if the "compilation" timecode could be provided as an input into the conversion from ESEF to DAPT
<cpn> Nigel: I want to make another suggestion, don't know how feasible it is
<cpn> ... At the moment, the compilation gives a single continuous output media, with a timepoint expressed in timecode. Could that be provided as an input in the conversion from ESEF to DAPT, provided earlier, so that defines time 0. Then you don't need anything in DAPT as that defines the time of the output
<cpn> Cyril: This question does seem applicable to more than DAPT, should discuss in context of TTML2
<cpn> Nigel: We can do that, but I'm try to reframe it to make the problem disappear
<cpn> Matt: Unless you're producing a BWAV, the WAV has no concept of timecode, so descriptions are offset from 0
<cpn> ... When you want to consume that file downstream, the challenge is how does the consumer how it relates to the asset it belongs to?
<cpn> Pierre: In my mind that's a workflow issue. Whoever is producing the wave file needs source material. Can be done in different ways, text with a playlist, a web player. There's some context that the wave file is part of
<cpn> ... So the workflow is in charge of making sure those things stay synchronsied
<cpn> Matt: For us that's a proxy file, which must have the same timeline as the target
<cpn> Pierre: One way to achieve that is send the proxy and require whoever creates the wave file to route it back into the proxy, so there's no ambiguity on the relationship between the two. Reingest the created audio essence back in to their assent managmenet system
<cpn> ... Then the playlist provides uniambugity in the relationship between them
<cpn> ... Or use a web based application that includes the original cnotent, the proxy, then it's all dont behind the scenes and the relationship is preserved by the system
<cpn> Nigel: There's a legacy problem here, there's a large number of ESEF AD files, exist independently of any workflow or asset management system
<cpn> ... If you have the original media, you can relate them, but you may not have access to that when you want to convert to a different format
<cpn> ... That's part of the challenge here
<cpn> ... So going back to my original question of providing the data upfront, you can't because you don't have it
<cpn> ... If you want to avoid having additional metadata, the conversion task has to look it up from somewhere else, and that may not be easily accessible
<cpn> Ewan: Yes. My feeling is, i the absence of the data to convert a script to the DAPT file, you'd have an archive of ESEF files, so would extend the live of the ESEF standard. A service provider, trading in scripts from other providers using non DAPT you may not be able to exchange without that additional context
<cpn> Nigel: We could express as metadata, and deferred processing, rather than making it a fixed offset. Does that tie it too closely to a specific process, not generic enough?
<cpn> Matt: Our files have a content start time and content end time, in the ESEF header
<cpn> ... Relies on the describer putting the data in
<cpn> ... But if we have a wav file that doesn't match the duration of the content, things go wrong
<cpn> Ewan: That's #230
<cpn> The compiled wav may extend beyond the content end time
<cpn> ... Not always possible to populate the value
<nigel> SUMMARY: Issue discussed, alternative workflows considered, potentially frame as "deferred conversion data" or similar.