w3c / media-source

Media Source Extensions
https://w3c.github.io/media-source/
Other
268 stars 57 forks source link

Support "seamless" cross-codec (and possibly cross-container/track) switching #155

Closed wolenetz closed 3 years ago

wolenetz commented 8 years ago

Currently, the minimum interop requirement for A/V tracks is at most one of each. These tracks must conform to the mime-type used to construct their SourceBuffer(s). Consequently, web apps that would like to have "seamless" cross-codec (e.g., webm vp9->h264 avc->webm vp8) transitions within the same media timeline must polyfill and depend on event delivery/timers to approximate a best-effort transition across these using multiple MediaSource objects swapped in as media element src.

This issue is opened to understand: 1) more details of what web authors need around this use case, and 2) how the MSE spec might need to be adjusted to enable this use case.

paulbrucecotton commented 8 years ago

Isn't this directly related to the requested Ad Insertion use case and the related Implementation Use Cases?

wolenetz commented 8 years ago

It is. Thank you for linking those here.

wolenetz commented 7 years ago

Codec switching is a commonly requested feature for MSE. Recent discussion (and potentially improved ability to eventually support this) has been occurring in Chromium (https://crbug.com/695595).

Since the current title of this spec issue implies a solution route (and scope that includes cross-container switching, not just cross-codec switching), I'm renaming it to allow discussion of what the desired scope is first.

Also some details copied from the Chromium discussion, bold mine:

From this comment:

Doing [cross codec switching, possibly with cross container switching] would require a spec change in addition to implementation change. In addition to codec switching, would container switching also need to be required? (e.g. splice mp4:h264+aac ad into the middle of a webm:vp9+opus stream?)

In the spec, the initialization segment received algorithm disallows codec changes. Such an initialization segment identifies the codec parameters for upcoming media segment appends to that SourceBuffer. The spec allows for multiple initialization segments to be appended over time to a SourceBuffer (e.g. to accomplish stream adaptation), but disallows for the initialization segments to differ their codec. Paste from pertinent spec text:

Verify the following properties. If any of the checks fail then run the append error algorithm and abort these steps. The number of audio, video, and text tracks match what was in the first initialization segment. The codecs for each track, match what was specified in the first initialization segment. If more than one track for a single type are present (e.g., 2 audio tracks), then the Track IDs match the ones in the first initialization segment.

Rough idea of what might need to be done:

  1. Determine if container switching is also desired, and if so, how to signal such a switch normatively to the SourceBuffer (SourceBuffer.abort() might not be enough by itself; "sniffing" which container bytestream type is appended following a parser reset might not be a complexity that user agents want to be required to do. Alternatively, a mechanism - perhaps a parameter to the abort() call itself, or a distinct method - could be added and required of apps to directly signal which container bytestream will be appended next.

  2. Modify the spec (and incubate in WICG with interested browsers and other web authors) to 2.a. Include a mechanism by which the API can be queried proactively to determine support for codec (and maybe container) switching. Not all implementations must allow such switching, right? 2.a.1. Perhaps a new method on MediaSource, akin to isTypeSupported()? Or maybe modify isTypeSupported() (and possibly the non-MSE canPlayType()) enough to provide definitive response to the app about? 2.a.2. Or perhaps something in the MediaCapabilities spec? 2.b. Update the spec for MediaSource.isTypeSupported() 2.c. Update the spec for the initialization segment received algorithm.

  3. Modify Chrome's implementation to experiment through incubation and eventually ship it on by default.

From this comment:

If "seamless" container switching is required, then either a mechanism for proactively requesting track switching is needed, or proactively requiring the application to identify which tracks across the different containers map to the same logical stream. The former is probably simpler to do with the way media track in HTMLMediaElement is currently described.

But if container switching isn't required, just codec switching within the same container track, then both spec and implementation change scope is reduced to accomplish just "seamless" codec switching.

Comments welcome, especially about whether or not cross-container switching is desired, or just same-container cross-codec switching within the same track.

paulbrucecotton commented 7 years ago

In addition to codec switching, would container switching also need to be required? (e.g. splice mp4:h264+aac ad into the middle of a webm:vp9+opus stream?)

Wouldn't container switching require changes in HTML5?

/paulc

wolenetz commented 7 years ago

In addition to codec switching, would container switching also need to be required? (e.g. splice mp4:h264+aac ad into the middle of a webm:vp9+opus stream?)

Wouldn't container switching require changes in HTML5?

@paulbrucecotton This isn't totally clear to me, though I suspect it would involve at minimum further extension of HTML5 A/V/Text tracks that MSE already extends. What part of HTML5 were you specifically thinking might need changing to accommodate MSE SourceBuffer container switching (which is "MSE bytestream" switching).

paulbrucecotton commented 7 years ago

What part of HTML5 were you specifically thinking might need changing to accommodate MSE SourceBuffer container switching (which is "MSE bytestream" switching).

@wolenetz - I could be wrong but I thought HTML5 only supported a single MIME type for content attached to a particular HTMLMediaElement. Every time someone asked for the ability to change MIME types ie for ad insertion someone would point to this this restriction and the discussion would end.

/paulc

wolenetz commented 7 years ago

@paulbrucecotton I don't recall that particular restriction. Rather, MSE v1 requires implementations to support at least 1 audio and 1 video codec in a MediaSource (in 1 or 2 SourceBuffers), but leaves it as a quality of implementation issue for a user agent to possibly support more than this. Historically, many implementations have only done the minimum here, though this is slowly changing.

canPlayType() may need to be extended, though.

-edit added the canPlayType() portion.

adrianba commented 7 years ago

This is an issue worth discussion but there are several issues in this space that we explored in the past. For example, what happens when the tracks don't match in the inserted segment. See Bug 22137 - changes in number of audio tracks during advert insertion. canPlayType and anywhere else decisions are made using the codec params would be impacted by allowing format switching.

One approach to solving this was proposed to provide a way to schedule audio/video track selection changes in HTMLMediaElements. Since this was a generic capability for media elements independent of MSE, it was filed on the HTML5 spec but no concrete proposal was made.

wolenetz commented 7 years ago

@adrianba - excellent recall! Using timed text track metadata cues to tell the UA to perform the transitions makes sense to me on the surface. @silviapfeiffer, are there standardized metadata cue possibilities already for this, or would this require new kinds of cues?

If folks agree that metadata cues afford the best route forward, we should follow https://www.w3.org/Bugs/Public/show_bug.cgi?id=22785#c19 and begin working on incubating a proposal (for HTML5 media in general, not just MSE).

wolenetz commented 6 years ago

@jdsmith3000 @mwatson2 @mounirlamouri @jyavenard, I have a proposal for potentially incubating a solution for this issue, hopefully for WICG incubation towards inclusion in MSE vNext. Please take a look:

https://github.com/wolenetz/media-source/blob/codec-switching/codec-switching-explainer.md

jyavenard commented 6 years ago

At a quick glance, I don't understand the need for the changes as stated. A change of spec isn't compulsory.

Currently, the only thing forbidden by the specs is a change of codec type within the same source buffer. If that limitation still need to be applied (which would be disappointing) the existing spec already allows for adding multiple source buffer of different types. The user agent would "just" (I say just because that's damn hard to implement) have to select which source buffer contains the data we need at a present time. If the data is only found in a source buffer, that source buffer is selected, otherwise any use buffer would do (maybe could use the one with the best quality)

The impossibility to change code in a source buffer is an artificial limitation, one that would be easy to remove. If we kept the scope to only allow codec change (not container type change), one could simply append a new init segment of the new data type and continue as before. The new data type would coexist with the existing data.

Such modifications would be trivial (at least for gecko), would allow seamless codec change, and would integrate well with many content providers (they typically don't want to change container)

Why the need for "changeType" when the information is already contained in the init segment to be appended ?

wolenetz commented 6 years ago

The two main parts that the proposed changeType() method affords are cross-codec and cross-bytestream "seamless" adaptation.

While some content providers may not want to change container (bytestream), there is a long-documented use case for this. Further, this allows adaptation cross-bytestream even within "primary content", not just the ad-insertion case.

While I'm increasingly in favor of dropping Chrome's strictness of codec parameters in both addSourceBuffer() and the proposed changeType(), simply dropping that strictness is insufficient at least for affording the seamless cross-bytestream use case.

Also, the proposed changeType(), even for cross-bytestream-switching, requires minimal additional spec'ing; definitely less than might be required for cross-SourceBuffer switching (the latter is certainly not the route I'm proposing).

wolenetz commented 6 years ago

@jyavenard https://github.com/w3c/media-source/issues/155#issuecomment-373724036, also:

Why the need for "changeType" when the information is already contained in the init segment to be appended ?

In the cross-bytestream case, changeType() provides a clear point where the application can indicate the bytestream format is changing. This allows the implementation to perform any necessary parser reset and reconfiguration to extract data out of the (potentially new) bytestream format.

joeyparrish commented 6 years ago

I generally like the proposal in your explainer, but I have a couple of questions.

  1. Would an application be able to detect that changeType() is present in the implementation?

Because of the way Shaka Player is currently architected, this is something we would require to adopt changeType() quickly.

For example, if we could see it before setting up MediaSource by checking SourceBuffer.prototype.changeType, then we could know early and modify the way we filter out streams before playback begins. This would be much preferable to finding out at MediaSource setup time or at switching time that changeType() isn't implemented.

I recall some previous versions of Safari didn't expose SourceBuffer on window at all, though I don't think this is still true, and I don't know if that is explicitly spec'd or not.

  1. If MediaSource.isTypeSupported() returns true for both types a and b, and changeType() is present, is it then implied that we can change from a to b and vice-versa? Or can an implementation support two types independently without permitting switching between them?

If switching between two supported types is not guaranteed to be supported by a given implementation of changeType(), this will complicate things for us. We currently have a one-way pipeline leading into MediaSource. We could react to the rejection of a certain type at switch time, go back up the pipeline, drop that type, and switch to another type and try to recover, but we have nothing resembling this today. changeType() would be more convenient to adopt if we didn't have to grapple with that.

jyavenard commented 6 years ago

@wolenetz My suggestions for simply relying on the init comment, was purely related to support of codec change within the same container. All required for support of this, spec wise is the removal in 3.5.7.5 (https://w3c.github.io/media-source/index.html#sourcebuffer-init-segment-received) of the line "The codecs for each track, match what was specified in the first initialization segment."

For cross-codec, cross container. I see no advantage in having changeType() when all of this can be achieved with multiple source buffers with no change in the current spec whatsoever. Though, having better description on the expected behaviour in switching source buffers (like do we prefer the most recent data if buffered range is available in both etc...) It would keep the spirit of the original spec, as I believe it was intended.

A source buffer contains one type of binary blob (mp4, webm, mpeg), it doesn't introduce the difficulty that @joeyparrish mentioned above (what would isTypeSupported return and the like) by piggy backing something on an existing well defined container.

Adding a new method to just get around some user agent implementation isn't enough I believe to justify those changes.

jyavenard commented 6 years ago

Following private discussion with @wolenetz, we agree that having a SourceBuffer::changeType(DOMString type) be the easiest way forward. changeType would have the same prototype as addSourceBuffer

changeType would take the same argument format as MediaSource::addSourceBuffer()

In addition, 3.5.7 Initialization Segment Received (https://w3c.github.io/media-source/index.html#sourcebuffer-init-segment-received) in step 3 (If the first initialization segment received flag is true, then run the following steps: ) the line "The codecs for each track, match what was specified in the first initialization segment." is to be removed, allowing seamless codec change within a container type.

jyavenard commented 6 years ago

A test page was setup there: https://jyavenard.github.io/htmltests/tests/mse_mp4/changeType/changeType.html Using firefox nightly, set media.mediasource.experimental.enabled and media.mediasource.webm.enabled preferences to true.

This plays a h264/aac 400x300 followed by h264/aac 640x480 followed by vp9/vorbis 400x300

wolenetz commented 6 years ago

I've updated my media-source fork to now be off wicg (which is off w3c), instead of directly off w3c, since GH doesn't allow for one account having multiple forks sharing an ancestor repository.

For feature work on this issue, let's continue using just this w3c github issue for now. If that becomes too complex, we can add a tag "codec-switching" to track multiple issues related to this feature later.

Feature work will be at github.com/wicg/media-source repository's "codec-switching" branch. This allows multiple other features to also be incubated.

I've populated that branch just now with my codec-switching-explainer.md.

The updated URL for the codec-switching explainer is now: https://github.com/wicg/media-source/blob/codec-switching/codec-switching-explainer.md

wolenetz commented 6 years ago

@jyavenard - Please review the pull request containing the new codec-switching logic: https://github.com/WICG/media-source/pull/2#issuecomment-392979899

If you need wicg repo permissions to be able to comment on that PR, please let me know.

Thanks!

wolenetz commented 6 years ago

(As of next dev/canary build following the changes which landed today), Chrome M69 has SourceBuffer.changeType() available when chrome://flags/#enable-experimental-web-platform-features is enabled.

I have also landed changeType() web-platform-tests: https://github.com/web-platform-tests/wpt/pull/11618 These check the basic interface, implementation steps, and (if there are enough of the test media types detected supported) append various test media, including changeType() and overlap-appends, and then plays the buffered media from start to finish.

Note that some implementations may, for some test media, introduce buffered range gaps across appended media, just like existing same-codec, same-bytestream media before this feature (typically, when the unbuffered gap exceeds implementation-specific tolerances). Such gaps continue to motivate solving https://github.com/w3c/media-source/issues/160 for MSE vNext, too.

wolenetz commented 6 years ago

I've updated the codec switching explainer today to describe the API in the incubation specification that evolved from the original proposal. The explainer now also includes links to implementation status and experimental web platform test results, describes the routes chosen in the incubation spec to resolve the original proposal's open questions, and it also mentions the "implicit codec switching" scenario and the related new non-normative guidance also included in the incubation spec.

jpiesing commented 6 years ago

I've updated the codec switching explainer today to describe the API in the incubation specification that evolved from the original proposal. The explainer now also includes links to implementation status and experimental web platform test results, describes the routes chosen in the incubation spec to resolve the original proposal's open questions, and it also mentions the "implicit codec switching" scenario and the related new non-normative guidance also included in the incubation spec.

I'd like to comment on the last paragraph in the explainer;

To what level should we specify "seamless" playback across bytestream, codec (and perhaps encryption) changes? This is likely a quality-of-implementation output, rather than a specified input. Decoder reconfiguration, for instance, may not be sufficient in all implementation instances, to support precision across a transition. This is analogous to the same treatment of playback quality across adaptations allowed by REC MSE today: implicitly quality-of-implementation.

I can understand why this approach has been taken but it makes it (very) hard to decide when changeType should throw an exception from the test "If type contains a MIME type that is not supported or contains a MIME type that is not supported with the types specified (currently or previously) of SourceBuffer objects in the sourceBuffers attribute of the parent media source , then throw a NotSupportedError exception and abort these steps. "

In cases where the browser is delegating media rendering to the OS+hardware, if the definitions of seamless and supported are a quality of implementation issue (without even any examples) then how can someone porting a browser to a particular OS+hardware know when to throw the NotSupportedError and when not?

For example, if the best a particular browser+OS+hardware platform could do for a transition between two particular codecs would involve pausing the media timeline for perhaps 0.5s (e.g. while a hardware decoder is re-initialised), with black video and audio silence, should this count as supported?

If changeType doesn't throw an exception even for things which (most) developers would consider unreasonably bad, where does this leave developers?

If this is left purely to implementers then the likely result will be sufficiently varied as to undermine the usefulness of the feature.

guest271314 commented 5 years ago

Is changeType() intended to support change from "video/webm;codecs=vp8" to "video/webm;codecs=vp8,opus"?

wolenetz commented 5 years ago

changeType is intended to support changing the codec or bytestream for an existing track, with some flexibility for track ID varying if there is only one track of that media type (e.g. audio or video). However, if previously, only one track type (e.g. audio-only or video-only) were buffered into a SourceBuffer, using changeType to switch that track type (audio->video or vice-versa) or to add another track of any type is not intended to be supported.

gregwhitworth commented 4 years ago

I've noticed that this has begun landing in more and more browsers and I don't see it in the Editor's draft, what is the plan for getting this into the official media-source spec?

wolenetz commented 4 years ago

https://github.com/w3c/media-source/issues/155#issuecomment-551941848 / @gregwhitworth that is a very timely question. I'm working on upstreaming the various incubations, including this one, from wicg to the main w3c repo. Expect progress on this very soon.

JohnRiv commented 4 years ago

We discussed this within CTA WAVE and are pleased to see it is planned for v2.

If it is possible to address @jpiesing's most recent comment, that would be beneficial.

guest271314 commented 4 years ago

In cases where the browser is delegating media rendering to the OS+hardware, if the definitions of seamless and supported are a quality of implementation issue (without even any examples) then how can someone porting a browser to a particular OS+hardware know when to throw the NotSupportedError and when not?

For example, if the best a particular browser+OS+hardware platform could do for a transition between two particular codecs would involve pausing the media timeline for perhaps 0.5s (e.g. while a hardware decoder is re-initialised), with black video and audio silence, should this count as supported?

Yes. That should count as supported. Then changes can be made to improve support. Chromium MediaRecorder implementation from https://download-chromium.appspot.com/ does not support H264, AVC1, and does not play MP4 at HTMLMediaElement. Different browser source code builds support different codecs.

The implementation MUST disclose which codecs they intend to support is clear language to developers. isTypeSupported() facilitates a means to determine which codecs are supported before proceeding with any media playback code.

Relevant to delays between frames when switching media, the ideal is "seamless". Where evidence, for example, in a bug report at an implementation describes delay between frames, that is in fact a bug. RTCRtpSender.replaceTrack() is describes similar seamless replacing of tracks. One important aspect of switching video tracks is handling variable input width and height tracks in sequence. Using a MediaStream as source the HTMLMediaElement could resize several times before reaching encoded pixel dimensions. MediaSource at Firefox was capable of handling variable frame video input before changeType() was supported. Chromium had several bugs that crashed the browser when variable pixel dimension tracks were attempted to be rendered at HTMLMediaElement due to particular encoder used.

Implementations MUST disclose the codecs used and the specific implememtation thereof to the user, in code, if possible, so that the user knows which encoder or decoder the implementation is using, coould save dozens of bugs just to get to the source of a specific codec limitation.

patrickkunka commented 3 years ago

Would somebody be willing to assist me regarding the recommended approach for polyfilling this on legacy platforms that do not support changeType? Specifically with regard to a simple codec change (not container), for example - when transitioning to an ad period in a multi-period DASH stream.

I've seen reference to "resetting" the source buffer, or simply filtering out representations with a codec not matching the first played one, but I'm unsure how either of these would work in a practical sense.

Assuming we can have only one audio or video source buffer attached at a time, and the player will be appending data into the source buffer several seconds ahead of the play head, how could the source buffer be replaced or reset at the point of "new codec append" without interrupting playback?

Any advice much appreciated.

wolenetz commented 3 years ago

@https://github.com/w3c/media-source/issues/155#issuecomment-743458350 : The practical problem you'll probably hit due to an implementation's lack of changeType support is approaching "seamless" UX across the transition. Some approaches might be: 1) Ensure all your content uses the same bytestream format and codec. Transmux in JS or on server-side might be necessary to keep the content in the same format. However, if you have distinct codecs in your content such that there is no one codec common to all the content needed in your presentation, then your other options might be: 2) Consider preparing-in-advance of the transition point another media element with an attached mediasource and begin buffering into it. At the transition point, then swap media elements. Note that removing/adding a media element from a document may cause it to invoke the load algorithm (and thereby lose any prebuffered media in a previously attached MediaSource). I'm uncertain of mechanisms for showing/hiding an element in a document, but I suspect you'll need to hide it from visibility when prebuffering to it before the transition/swap point. Note that accomplishing that swap will likely not be seamless: the precise timing of the swap is lower-bounded by the scheduling/dispatch precision of the JS runtime in which your app is running, and the actual swap may have implementation constraints (maybe an implementation only supports one media element at a time, in the worst case, such that prebuffering is infeasible). 3) Alternatively, attach a new MediaSource to the media element and begin buffering to it at the transition time. This will introduce some UX-visible latency at the transition point, because buffering hasn't begun until that time. If I understand correctly, this might be what "resetting" the SourceBuffer refers to. Alternatively to creating a new MediaSource, you could try to removeSourceBuffer() each of the old SourceBuffers at the transition time, then add new ones and buffer into them. However, support for adding new SourceBuffers after having reached 'loadedmetadata' is not required of implementations, so you might have to start from scratch with attaching a new MediaSource first. Attaching a new MediaSource is likely to be the most common denominator for supporting a sequence of presentations for which codecs differ, when changeType is not available. 4) Probably unavailable on the implementation since it doesn't have MSE changeType() support, but consider checking if WebCodecs decoding and rendering support are available in the implementation, and if so, you might consider implementing your own player based on WebCodecs.

edited to reference the comment to which this is a response.

patrickkunka commented 3 years ago

@https://github.com/w3c/media-source/issues/155#issuecomment-743466313 thank you that's very helpful.

wolenetz commented 3 years ago

274 merged the WICG-incubated specification for this feature into MSEv2 main spec (editor's draft).