w3c / media-source

Media Source Extensions
https://w3c.github.io/media-source/
Other
268 stars 57 forks source link

ISO BMFF bytestream: how can CEA 608 / 708 embedding be supported? #58

Open wolenetz opened 8 years ago

wolenetz commented 8 years ago

If I understand correctly, CEA 608 / 708 embedding of text track data is one option for sourcing text track data described within https://dev.w3.org/html5/html-sourcing-inband-tracks/#mpeg4 However, such embedding/signalling of the embedding occurs after the ISO BMFF initialization segment. Since track types and counts must remain consistent across initialization segments for a SourceBuffer, and the initialization segment received algorithm is the sole place in MSE where various track attributes (like SourceBuffer.TextTracks and HTMLMediaElements.TextTracks) are populated, is it impossible to support CEA 608 / 708 embedding of text track data in ISO BMFF in any compliant MSE implementation?

@silviapfeiffer : Am I missing some part of signalling for CEA 608 / 708 in ISO BMFF that actually occurs within an MSE initialization segment?

@jdsmith3000 / @mwatson2 / other MSE user agent implementors: Do you have this working somehow? Does the MSE ISO-BMFF bytestream spec and/or https://dev.w3.org/html5/html-sourcing-inband-tracks/#mpeg4 need some update, or am I indeed missing some simple signalling within the defined MSE ISO BMFF initialization segment which allows such embedded text tracks to be known at the time of executing the initialization segment received algorithm?

boblund commented 8 years ago

The 608/708 caption data is contained within the ISOBMFF video track; there is no separate ISOBMFF track. There isn't a change in track type/count wrt the init segment. The user agent creates an HTML Text Track for captions it finds embedded in the ISOBMFF video track. So, the track composition of the corresponding HTMLMediaElement doesn't change either. It is true that there is an HTMLMediaElement text track containing the captions for which there is no corresponding unique ISOBMFF track.

wolenetz commented 8 years ago

@boblund That sounds like a potentially workable approach that meets the needed flexibility to adapt dynamically to the presence of 608/708 caption data. However, I think, for interoperability, the bytestream spec and perhaps even the MSE spec might need to call out the precise sequencing / logic around supporting SourceBuffer.textTracks (and HTMLMediaElement.textTracks) changes when the user agent encounters 608/708 caption data: 1) Should some portion of the initialization segment received algorithm be executed (such as application of trackDefaults, conditional addition of the new track(s)'s SourceBuffer to activeSourceBuffers, and queueing tasks for events and so forth as currently contained within the subsection of the initialization segment received algorithm at "For each text track in the initialization segment, run following steps:")?' 2) For apps to detect UA support for 608/708 caption data (e.g. through isTypeSupported(...) and addSourceBuffer(...) and to engage this 608/708 detection in ISO BMFF bytestreams, is there an RFC 6381 compliant, specific "video/mp4" mimetype that describes specifically 608/708 content? If not, should there be one? (And should it describe any details of the track(s)?)

cconcolato commented 8 years ago

FYI, there is currently a proposal within MPEG CMF/CMAF to mandate the use of a subtitle track declaring 608/708. That track is proposed to contain no sample data but to be linked by means of a track reference to the video track. That could help here. This proposal has not been accepted yet. It might be the right time to comment on it.

foolip commented 8 years ago

HTML Text Track

Do you mean the HTMLCue proposal? That might be able to represent paint-on by having a new cue for every change (each almost like the previous) but not with roll-up.

wolenetz commented 8 years ago

Triaging to VNext per our process and HTML Media Extensions WG editors conf call today.

boblund commented 8 years ago

Regarding 1) by @wolenetz above, I don't think any change is required in the MSE spec as it is already clear on applying track defaults. I can clarify in https://dev.w3.org/html5/html-sourcing-inband-tracks/#mpegdash is that when the presence of 608/708 in the video stream is signaled in the MPD, and MSE is being used, the web app needs to create a track defaults object and source the object attributes as defined. I think doing this resolves this issue.

Regarding 2), AFAICT the MIME type in isTypeSupported (and canPlayType) refers to the media container format and optionally video/audio codecs. There doesn't appear to be a way to query support for text track formats. This may be desirable but seems to be a broader issue than MSE or 608/708 closed captions.

wolenetz commented 8 years ago

@boblund, MSE applies trackDefaults currently only in the initialization segment received algorithm. If there is no text track known at that time (since it's embedded somehow in the video track), then how can MSE's init segment received algorithm apply those trackDefaults to the as-yet-unknown text track(s). Further, trackDefaults applied by the MSE initialization segment received algorithm are done to tracks added to SourceBuffer.{audio,video,text}Tracks and HTMLMediaElement.{audio,video,text}Tracks. Hence, I'm having difficulty understanding how both (a) trackDefaults are applied to non-init-segment-received tracks, and (b) how "the track composition of the corresponding HTMLMediaElement doesn't change", but "there is an HTMLMediaElement text track containing the captions for which there is no corresponding unique ISO BMFF track." Does the app out-of-band-and-independent-of-MSE do all the necessary trackDefault application to an app-generated track of some kind?

paulbrucecotton commented 8 years ago

@boblund: At our Editor's meeting on Mar 29 you agreed to provide an further explanation on how this issue should be handled. Can you please do this ASAP?

boblund commented 8 years ago

In response to @wolenetz. Yes, trackDefaults won't work for the reason you note. A better alternative would be for the application to use HTMLMediaElement.addTextTrack() to create a TextTrack object for the 608/708 captions based on information in the MPD (or other manifest file type), before media segments are added to source buffers. The TextTrack attributes would be set as specified in https://dev.w3.org/html5/html-sourcing-inband-tracks/#mpegdash. The user agent then sources 608/708 cues from the video as defined in the sourcing spec. There needs to be a better description for how this will be done in the ISOBMFF case, which I will do.

foolip commented 8 years ago

That spec says "Browsers that can render the CEA 708 format should expose them in as yet to be specified CEA708Cue objects. Alternatively, browsers can also map the CEA 708 features to VTTCue objects. Finally, browsers that cannot render CEA 708 captions should expose them as DataCue objects."

Much of the difficulty in this is in those decisions, I suspect.

paulbrucecotton commented 8 years ago

There needs to be a better description for how this will be done in the ISOBMFF case, which I will do.

@boblund - What is the status of this work?

boblund commented 8 years ago

I propose adding the following text at the end of the "Mapping Text Track content into text track cues for MPEG-4 ISOBMFF" section in the HTML Sourcing spec.

ISO BMFF captions in the CEA 708 format [CEA708] are carried in the video stream in SEI messages [DASHIFIOP]. Browsers that can render the CEA 708 format should expose the caption data to the web application by mapping the CEA 708 features to VTTCue objects [VTT708].

I will make this change in HTML Sourcing spec if there is no other discussion on this topic by May 18.

boblund commented 8 years ago

PR to clarify how CEA708 cues are sourced in ISO BMFF and MPEG-2 transport streams.

boblund commented 8 years ago

I've merged the PR, given no comments here or on the github repo page.

estobbart commented 6 years ago

Hoping for some clarifications on CEA 608/708. I understand the bit about mapping them to VTTCues, but this seems incomplete. CEA 608/708 contains display data (PACs) that seems to be lost in the conversion to VTTCues. It's also unclear when that track would become available to the application. If the data is in SEI NAL's, then when "HAVE_METADATA" is reported from an Initialization Segment it would lack the text track.

silviapfeiffer commented 6 years ago

@estobbart there's a lot more thoughts about how to map CEA 608/708 to WebVTT in this specification: https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.html , including PACs.

estobbart commented 6 years ago

@silviapfeiffer This looks like it's done out of band, and provided as a WebVTT file? Is there somewhere in the middle where we don't have to provide a WebVTT file, but still have the browser control presentation from the SEI NAL's?

If that gap exists it'd be nice to see something like..

let mseBlob = URL.createObjectURL(new MediaSource());
video.src = mseBlob;
canvas.src = mseBlob;
silviapfeiffer commented 6 years ago

Are you asking for browsers to do the conversion from CEA608/708 to WebVTT ?

estobbart commented 6 years ago

Not necessarily a conversion, but a direct presentation (keeping the display as close as possible to the decoder PTS). Or if VTTCue in a textTrack contained display information. I'd prefer the former. I'd be happy to contribute here also, if there's an opportunity for that.

wolenetz commented 4 years ago

This seems still highly unclear to me how/if to parse and convert this data versus the app directly doing parsing, conversion, and population of an out-of-band Track. There is a separate MSE issue discussing more precise DataCue event creation from inband MSE data (https://github.com/w3c/media-source/issues/189); do these two cover all use cases such that this particular issue can be closed?

chrisn commented 4 years ago

The DataCue scope doesn't include in-band captions, so I expect the specifics of CEA 608 / 708 raised here remain open.