google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.7k stars 6.02k forks source link

Add support for mp4 embedded WebVTT captions (ISO/IEC 14496-30:2014) #689

Closed sandersaares closed 8 years ago

sandersaares commented 9 years ago

As far as I can tell, ExoPlayer currently does not support WebVTT captions that are packaged according to ISO/IEC 14496-30:2014. This is a desirable feature, since plain-text files do not really work with e.g. live streams.

ojw28 commented 9 years ago

I assume that's WebVTT embedded inside of MP4?

We already support embedded TX3G and (I think) TTML. Given we already have a WebVTT parser, it should in theory be straightforward to add support for embedded WebVTT as well. Can you provide a link to some sample media?

Also, feel free to make a pull request to add support yourself, if you figure out how. AtomParsers.parseTrak and AtomParsers.parseStsd is probably where you need to look.

sandersaares commented 9 years ago

Yes, that is WebVTT embedded inside of the fragmented MP4 stream. Here is an example video of this sort, with three caption streams: http://media.axprod.net/ExoPlayer/Captions/Manifest.mpd

ojw28 commented 9 years ago

If you're doing DASH, why don't you list a separate AdaptationSet containing the captions (one representation for each language)? This is already supported.

We don't currently support multiple tracks within fragmented mp4 for DASH, because we expect that providers will provide separate adaptation sets each containing a single type of media (audio/video/text).

ojw28 commented 9 years ago

Oh right, sorry, perhaps you're doing something along these lines already. Do the fmp4 representations containing the subtitles only contain subtitles, and nothing else?

sandersaares commented 9 years ago

Right, only subtitles in those representations - nothing else.

ojw28 commented 9 years ago

Understood. Although why are you packaging it inside fmp4, rather than just providing the VTT content directly. Normally we'd expect something like this.

sandersaares commented 9 years ago

The main motivation is live video - if the content is being created live (e.g. transcoded from a satellite feed), then there really is no way to provide the VTT file in full - it is being created piece by piece, in real time.

ojw28 commented 9 years ago

For live video you'd normally provide segments, each ~5 seconds in length (or similar), just like for audio and video in DASH. For text, each segment would be a complete VTT file covering that period of time.

sandersaares commented 9 years ago

That certainly sounds like a reasonable option, conceptually speaking, but we have so far not encountered systems that produce such data, at least in our current work.

The reason we have MP4 containers right now is simply because existing tools available to us do that, presumably since it is what the referenced standard specifies and following a standard seemed nice to their authors. Would be great if ExoPlayer supported this, especially since it does appear to be an official standard.

ojw28 commented 9 years ago

Sure, we'll take a look. Note that including VTT directly is also standard compliant. I suspect it's identical except without the container - which probably doesn't add any value given it's just wrapping the VTT. If so, it's probably nicer to deliver without the container if possible.

ojw28 commented 9 years ago

I've been taking a look at this. Do you think it's correct that the mimeType in the manifest is video/mp4? I would have thought it should be text/mp4, which would be analogous to the audio representations having type audio/mp4.

sandersaares commented 9 years ago

I will walk through the relevant parts of the standards and related documents and outline the reasoning.

ISO/IEC 23009-1:2014 (DASH) 7.3.1

The @mimeType attribute of each Representation shall be provided according to RFC 4337. Additional parameters may be added according to RFC 6381.

The former defines video/mp4, audio/mp4 and application/mp4. The latter basically defines how the "codecs" parameter is to be used.

Since RFC 4337 does not define text/mp4, this is already a weighty blow against the idea. The point is reiterated in 8.3.2 of the DASH specification, where the three allowed MIME type values are explicitly outlined in an explanatory note.

DASH-IF IOP v3.0 table 17 lists application/mp4 as the correct MIME type for MP4 encapsulated SMPTE-TT (TTML). This is one of those allowed by RFC 6381, so I presume that is where it came from.

Based on this, I would conclude that the tools we use should actually generate application/mp4 instead of video/mp4. While I agree that text/mp4 would be more "obvious", it seems not to be one of the RFC-permitted values. I will modify our content production workflow to manually edit the MIME types after processing and have them changed to application/mp4, to make our data conformant.

FYI, I randomly noticed this statement in the IOP document that might lend some additional weight and background to encapsulating subtitles in MP4:

(3) DVB DASH only supports ISO BMFF encapsulated TT, but not XML-based.

This is further expanded upon in detail by DVB BlueBook A168, chapter 7, which outlines why this is so and how DVB DASH distinguishes different subtitle streams. Maybe relevant if ExoPlayer is intended to support DVB DASH.

Here is our edited video with the text MIME type corrected to application/mp4: http://media.axprod.net/ExoPlayer/Captions2/Manifest.mpd

ojw28 commented 9 years ago
sandersaares commented 9 years ago

At the moment, the tooling we use is only able to work reasonably with WebVTT, though if I manage to find equivalent real world TTML test data somewhere, I will try to make it available to you.

ojw28 commented 9 years ago

The change above adds support for TTML captions embedded as per 14496-30. Embedded WebVTT per 14496-30 uses a different format than regular WebVTT, and so will require a new SubtitleParser.

ojw28 commented 9 years ago

Update title to reflect the fact we support the TTML part of 14496-30.