Open mwatson2 opened 9 years ago
I have created a proposal for this issue: https://github.com/w3c/media-source/pull/8
Thanks for filing this, Mark. I am getting a response ready and should be able to share details (concerns and questions) by Sept. 18.
tl;dr: @mwatson2 and I discussed the origin of this feature request, and I don't think the current proposal is something we could use in MSE v.current, though we're working to find product-specific ways, perhaps outside of full MSE spec compliance to move forward.
Regarding this issue (and the associated pull request): This appears to be more than just a registry edit (details below). In fact, this is a feature request that requires multiple significant spec changes (MSE ISO BMFF, MSE, and HTML5). These changes will introduce significant delay in getting to PR and risk moving MSE backwards in the W3C process.
Hence, I recommend that we track this as a new feature request as part of a later version of MSE (as briefly discussed at the April 2015 f2f) and/or explore it in an incubator (or, of course, find an alternative standardizable and practical solution).
If the proposed multi-track approach is the only mechanism for multi-layer, this bug impacts and depends on changes to more than the MSE ISO BMFF byte stream spec:
The current proposal would need changes to [1-4], too.
[1] http://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing [2] http://w3c.github.io/media-source/#sourcebuffer-coded-frame-eviction [3] http://w3c.github.io/media-source/#sourcebuffer-init-segment-received [4] http://www.w3.org/TR/html5/embedded-content-0.html#dom-videotrack-selected
ISO/IEC 14496-15 describes the carriage of layered (scalable) encodings in ISO Base Media File Format. Examples include SVC and MVC.
Such layered encodings can be encoded within a single track, or with multiple tracks, for example one for each layer. In the multi-layer case, when Movie Fragments are used, there are two ways the data can be organized into movie fragments: (1) A single moof / mdat(s) pair can contain the data for the several tracks for each media segment (2) The several tracks can be split into several consecutive moof / mdat(s) pairs
Option (1) is supported by our existing MSE byte stream format, but option (2) is not because we require that each "media segment" consists of a single moof and mdat(s). Option (2) has advantage because, typically, the sequence of moof / mdat(s) containing the "base layer" can be processed by a device which does not understand the scalable encoding.
So, I propose we modify our definition of Media Segment for the ISO BMFF byte stream format to consist of a sequence of one or more ( moof, mdat (, mdat)* ) structures where:
If this is agreeable, I'll prepare the PR.