w3c / media-capabilities

Media Capabilities API
https://w3c.github.io/media-capabilities/
Other
77 stars 33 forks source link

text tracks not supported #157

Open mikedo opened 3 years ago

mikedo commented 3 years ago

There is only video and audio, not text, e.g. W3C TTML (IMSC1), WebVTT, etc.

johnsim commented 3 years ago

CMAF does support both IMSC1 and WebVTT tracks - though the last time I checked no one was actually producing this content.

mikedo commented 3 years ago

TTML/IMSC1 content is being produced by professional encoders in the television industry. There's commercial content in Korea on air for ~2 years; and now a few test markets in the US. Both use ATSC 3.0 which requires IMSC1. Since June you can buy TVs in the US that decode it in an HTML5 stack.

chcunningham commented 3 years ago

@mounirlamouri - pls cc Chrome tracks folks

Interesting issue. I'm a tracks noob, so some dumb questions:

johnsim commented 3 years ago

@mikedo - I was aware that TTML/IMSC1 content is being produced, but I didn't think anyone was using the CMAF bindings to produce IMSC1 tracks.

mikedo commented 3 years ago

@johnsim - I'm not sure what you mean by "bindings", but the CMAF Media Profile constraints are fairly simple for IMSC1. Offhand I am not aware of any substantive deviation. Most emissions are running at 1-3 second duration Segments time-aligned with video and audio. But compatible_brands is not being set if that's your point. Other than the brand declaration in the Segments, is there something specific that warrants dismissing text tracks generically (this issue was not about IMSC1 specifically)?

Maybe the real question is whether there are any actual capabilities (in this API context) needed for text tracks? But the purpose of my issue is to raise the awareness and not leave out text support accidentally because no one considered it.

johnsim commented 3 years ago

@mikedo I agree that text tracks should be supported and you are probably right that the real question is whether there are any actual capabilities (in this API context) needed for text track support.

On the IMSC1 CMAF "binding" issue, I meant delivering text tracks compliant with Annex L of the CMAF spec. Mostly this means being compliant with https://www.w3.org/TR/ttml-imsc1.1/ and setting the codecs parameter in the subtitle entry box correctly, etc., but it also means the text content is associated with a track compliant with CMAF clause 11 (subtitle tracks). I assume that as long as the text content is a legal CMAF text track that no text-specific API changes are needed and that all that is required is that the underlying media engine is capable of handling the CMAF text tracks aligned with CMAF audio and video.

So what I mean here comment is that we need CMAF text tracks to be used to develop and verify support in the browser.

ZmGorynych commented 3 years ago

the real question is whether there are any actual capabilities (in this API context) needed for text tracks? One thing I can think of is which type of subtitles are supported (EBU-TT, SMPTE-TT, IMSC1, ...; text profile? image profile?). Is it covered anywhere else?

CMAF also does not disallow CEA 708 captions, which are fairly common.

johnsim commented 3 years ago

@ZmGorynych I believe that if the underlying media engine supports document format X (EBU-TT-D, SMPTE-TT, etc.) then as long as the document is delivered in a compliant CMAF Text Track then the goal should be that it will be supported. Each would be a separate media profile.

On a side note, according to Annex L of the CMAF spec, a document that conforms to EBU-TT-D generally conforms to IMSC1.1, but I am not sure how to interpret the word "generally" in this sentence. :-)

johnsim commented 3 years ago

@zmgorynych To the comment on CEA 708, Annex A.4 of the CMAF spec provides subtitle media profiles and brands ('cwvt', 'im1t', im1i', im2t', im2i') and A.5 provides a supplemental brand 'ccea' to indicate that CTA-608 and CTA-708 are embedded in SEI messages in the video track.

johnsim commented 3 years ago

I believe the issue noted by @chcunningham here is that A track element (not to be confused with a ISO BMFF track) can be added as a child to an audio/video element to link a timed text file to the media content. Three issues - 1) I believe it is restricted to WebVTT format - is that still true? 2) independent of CMAF text tracks, 708 embedded captions, etc. how does media capabilities handle 'track' elements? and 3) how would CMAF text tracks be handled? Would this require the addition of a TextConfiguration along side AudioConfiguration and VideoConfiguration?

tidoust commented 3 years ago

1) I believe it is restricted to WebVTT format - is that still true?

If you're talking about the addTextTrack method, the spec does not restrict that to WebVTT cues, but rather to cues that derive from the TextTrackCue interface, such as VTTCue. In practice, the nuance is pretty thin because I believe VTTCue is the only such interface defined and supported across browsers, but for instance DataCue would fit in that category too.

mikedo commented 3 years ago

@johnsim We've wandered into WAVE perhaps, but regarding:

On the IMSC1 CMAF "binding" issue, I meant delivering text tracks compliant with Annex L of the CMAF spec. Mostly this means being compliant with https://www.w3.org/TR/ttml-imsc1.1/ and setting the codecs parameter in the subtitle entry box correctly, etc.

Why should it have to be constrained to 1.1? CMAF supports IMSC 1.0.1 as well, which is a subset of 1.1 and more widely deployed. If you are saying there are no full IMSC1 1.1 encoders/decoders, you may be right. But see CMAF Table A.3 for the text-based media profiles supported in CMAF.

chcunningham commented 3 years ago

1) I believe it is restricted to WebVTT format - is that still true?

Yes, I agree w/ tidoust.

2) independent of CMAF text tracks, 708 embedded captions, etc. how does media capabilities handle 'track' elements?

MC doesn't currently have any handling for tracks.

3) how would CMAF text tracks be handled? Would this require the addition of a TextConfiguration along side AudioConfiguration and VideoConfiguration?

Adding text capabilities to MC might look as you proposed. Reading the WAVE spec, I noticed at least some formats have a codecs= string (e.g. "stpp.ttml.im1t"). Is this true (or could it be made true) for all the formats under consideration?

I think Chromium historically has zero support for inband captions/subtitles. I'm not aware of any plans to add support. Do other UAs support?

chrisn commented 6 months ago

I think Chromium historically has zero support for inband captions/subtitles. I'm not aware of any plans to add support. Do other UAs support?

@jernoble, @ericcarlson What does Safari support, and do you have interest in capability detection?

jyavenard commented 6 months ago

WebKit supports in-band text track.

nigelmegitt commented 6 months ago

"in-band text track" is a troublesome term. I'm not sure what is intended by it here, but if possible I think it would be helpful to use terminology like "text tracks embedded in the encoded video stream", "text tracks referenced by the [DASH|HLS] manifest" etc to be clearer.