Enable buffering of WebCodecs Encoded Chunks for playback with MSE - aka "MSE for WebCodecs" or "MSE4WC"

andrewmd5 commented 7 years ago

It seems rather counterintuitive to force boxing of video frames for the API. When attempting to do real-time interactive applications like web based remote desktop, low latency is key and MSE forces a lot of overhead.

In an ideal situation allowing raw H.264 encoded frames to be passed to the hardware accelerated decoder and pushed into a video object solves these issues.

dwsinger commented 7 years ago

Hm. I think MSE was designed to support use-cases like DASH and HLS. If you are doing real-time, I would have thought that the WebRTC infrastructure may be more appropriate?

andrewmd5 commented 7 years ago

WebRTC has its own overhead, you'll need to go through the process of setting up a STUN/TURN framework and then the hacky solution of making it think a media source (webcam) is your stream.

When it comes to real-time video other platforms you're able to access the decoders at the lowest level. You shouldn't have to over complicate the solution to a "simple" problem.

jyavenard commented 7 years ago

Mozilla had opened a similar bug to investigate this problem (https://bugzilla.mozilla.org/show_bug.cgi?id=1325491)

You would still need to wrap the data in a container of some kind... Because the plain raw data doesn't provide sufficient information to properly display those frames.

I do believe that we can improve MSE to be more real time friendly. However, I'm not convinced using raw data will help much here. The overhead required in wrapping the content in an mp4 or a webm is rather low.

andrewmd5 commented 7 years ago

In solutions I've created outside the web I've only used raw data to achieve 60FPS real-time video, so I can't speak much to container format solutions.

The benefit of MSE is the hardware acceleration, however I do know that in my efforts to get real-time streaming working via MSE, delays often show up due to the I-frame delay present when sending over fragmented MP4's. A work around to this is sending frames individually as soon as they are captured, which is less than ideal since they each have to be boxed and every MS counts.

If you have any suggestions for approach with the standard we currently have, I'd appreciate fresh eyes.

jyavenard commented 7 years ago

I think you're making too many assumptions as to how MSE implementations work internally.

Sending raw frames vs having them muxed in a MP4 container would make zero difference in regards to speed of decoding, or the ability to use hardware decoding vs software. Both would be identical. Same in regards to WebRTC vs MSE, using MSE doesn't suddenly open the world of hardware acceleration.

The only thing you would save with raw frame, is the time it takes to demux a MP4, which really, is barely relevant in regards to the processing required to decode a frame.

Using individual frame in a fragmented MP4 vs using multiple frames in a MP4 would also make no difference in practice: The H264 hardware decoder available on Windows has a latency of over 30 frames. You need to input over 30 frames before the first one comes out. This is what is causing latency, not how many frames you're adding at a time, if they are muxed in a MP4 or not.

If you were to package 30 frames in a single MP4 fragment, or using 30 fragments of 1 frame, the latency would still be the same (as far as the first decoded sample is concerned). In fact, I can assure you that, at least with Firefox, doing a single fragment with a single frame really adds a lot of processing time, and packaging say 10 frames per fragment would give much better results.

roman380 commented 7 years ago

BTW hardware decoder in Windows might be instructed to enable low-delay mode (CODECAPI_AVLowLatencyMode). I would expect this to reduce decoding latency. However, generally speaking, it is unlikely that even standard mode has such processing latency, which basically disqualify the from real-time video scenarios. Encoders have it for their own reason, but not decoders.

Also I recalled DXVA H.264 decoder experience and it did produce output with reasonably small delay in terms of additional data on its input. It does require some processing time because, for example, it is multithreaded internally and certain synchronization is involved, however it is not as long as many additional input frames of payload data.

jyavenard commented 7 years ago

CODECAPI_AVLowLatencyMode is only available on Windows 8 and later (and you need a SP). We had to disable also because it caused crashes easily (see https://bugzilla.mozilla.org/show_bug.cgi?id=1205083). It also is incompatible if the content has B-Frame.

FWIW, even with CODECAPI_AVLowLatencyMode and H264, the latency is around 10 frames (until that MF_E_TRANSFORM_NEED_MORE_INPUT is returned).

As for disputing that the latency is that high without it, it may worth trying yourself first

roman380 commented 6 years ago

it may worth trying yourself first

I finally had a chance to check decoder output and whether low latency has effect, in Windows 10.

As I assumed decoder MFT does not need 10+ frames on the input before output is produced. Indeed, in default mode there is some latency and you keep feeding input before output is available.

In low latency mode it's "one in - one out" and it works great.

Let me make it absolutely clear. In low latency mode one does IMFTransform.ProcessInput, and the following ProcessOutput call delivers a decoded frame instead of returning MF_E_TRANSFORM_NEED_MORE_INPUT.

It could so happen it had issues in past, quite possible. But eventually it works and low latency mode has great value for near real-time video apps.

Andrey-M-C commented 6 years ago

@roman380 Did you try the low latency attribute on HEVC/H.265 decoder? From my experince, I don't see this attribute set by defaulte. And even if I set it, the decoder output is 3 frames behind.

roman380 commented 6 years ago

@Andrey-M-C I tried a random HEVC encoded file (presumably there might be factors affecting the behavior including hardware, OS and the footage) and here is what I got:

measuredecodelatency-hevc

Three frames behind on DXVA2-enabled decoding.

Andrey-M-C commented 6 years ago

@roman380 Thanks for the response! I see the same pattern. If you set CODECAPI_AVDecNumWorkerThreads to 1 for the software decoder than you'll be 4 frames behind, since will be only one decoder thread spawned instead of default four threads. Is there any way to get a clarification from Microsoft about the absence of the low latency mode in HEVC MFT?

roman380 commented 6 years ago

@Andrey-M-C I agree that decoder lacks flexibility and low delay mode does not even look like available. In particular, a sequence of just key frames still results in 9 frame latency with the software decoder which suggests the latency is there somehow by design (?).

The best place to ask MS comment (except opening an issue with support directly) that I am aware of is MSDN Forums here, however the comments there are still late and not so frequent.

wolenetz commented 5 years ago

I think this issue merits a slight re-framing (pun intended): 1) Low latency model/API for letting app explicitly and normatively modify how the MSE implementation treats output of decoder: queue and try to smooth rates versus "show ASAP, unless PTS interval was missed (drop in that case)" in video context, and "let app normatively describe tolerance and desired behavior w.r.t. buffered range gaps" for audio and video are being discussed (see #21 and #160), independent of:

2) Find an alternative to re-muxing into a supported bytestream (e.g. MP4, WebM, etc) to let apps more rapidly and ergonomically buffer media in MSE.

I propose this issue be refocused to target the latter.

andrewmd5 commented 5 years ago

We've actually managed to "trick" Chrome and Firefox into decoding in ultra-low latency mode -- of course gaps in data are still a potential issue but in the linked example, its only 7 MS of delay between the host and client. So at least we know its possible.

wizziwig commented 5 years ago

We've actually managed to "trick" Chrome and Firefox into decoding in ultra-low latency mode -- of course gaps in data are still a potential issue but in the linked example, its only 7 MS of delay between the host and client. So at least we know its possible.

Can you provide any details on how you tricked Chrome and Firefox into hardware decoding h.264 fast enough to allow less than 7ms total presentation latency? I would like to try reproducing your results. Was that just for decoding or total end-to-end including encoding, network transport, decoding, and windows desktop rendering/composition? Thanks.

jyavenard commented 5 years ago

With the right content, the Windows WMF h264 decoder may have no latency. In Firefox you need to set the preference media.wmf.low-latency.enabled to true.

That mode is enabled by default in Chrome, though the Microsoft documentation does state that it's not supposed to work with content having B-frames.

roman380 commented 5 years ago

.. Microsoft documentation does state that it's not supposed to work with content having B-frames.

Documentation quote: "B slices/frames can be present as long as they do not introduce any frame re-ordering in the encoder."

jyavenard commented 5 years ago

Almost all YT content as B-Frame requiring re-ordering, as most B-frames do. And yet chrome always enable the low latency mode and it obviously works.

Edit: oh, I just notice that the comment about B-frames is in relation to the encoder only

We disabled it on Firefox because it caused some crashes with some version of Windows 8.

roman380 commented 5 years ago

This bug Enable low-latency decoding on Windows 10 and later suggests that we might finally have CODECAPI_AVLowLatencyMode back with default settings, doesn't it? I think it's been working well in Chrome for quite some time.

wolenetz commented 3 years ago

With the advent of the WebCodecs API, there are now possibilities I'm looking into around potentially supporting a "WebCodecs" bytestream format for use in MSE, where encoded chunks and configurations (if not also decodedchunks) might be bufferable via new bytestream/MSE feature support. That work seems most applicable to be tracked by this issue.

wolenetz commented 3 years ago

I'm picking up API shape exploration for buffering WebCodecs encoded chunks as at least a partial solution for this spec issue. Prototype experimental implementation in Chromium will similarly be tracked by https://crbug.com/1144908. Plan is to have an explainer out soon, once I get a bit further into exploring implementability of this in Chromium.

wolenetz commented 3 years ago

I have created an explainer for supporting buffering containerless WebCodecs encoded media chunks with MSE for low-latency buffering and seekable playback.

Please take a look: https://github.com/wolenetz/mse-for-webcodecs/blob/main/explainer.md Please post any feedback here on this issue as early as you can, as I intend to prototype this in Chromium (https://www.chromestatus.com/features/5649291471224832).

wolenetz commented 2 years ago

I intend to transition the Chromium experimental implementation into origin trials to obtain further feedback on the ergonomics and usability of this feature. Some example use cases include simplifying and improving performance of transmuxing HLS-TS into fMP4 for buffering with MSE, and low-latency streaming with a seekable buffer.

Please reach out to me (wolenetz@google.com) or post here if you might be considering using this feature, and if you might want to participate in the origin trial.

wolenetz commented 2 years ago

The Chromium experimental implementation is currently in origin trials (as of M95). A draft specification of this feature in MSE spec is now in review (https://github.com/w3c/media-source/pull/302).

Note that there are some short-term bugs I'm working to fix in the Chromium prototype, hopefully to get fixed in time to be in the M96 milestone:

crbug.com/1255048: it doesn't support changeType(SourceBufferConfig))
crbug.com/1255050: it doesn't support h.264 EncodedVideoChunks' append support, and it assumes encoded chunks' DTS==PTS instead of using just 0 for all DTS of EncodedChunks' frames sent to the coded frame processing algorithm: this may require further refinement as noted in the spec draft as well if it is not working as expected once the prototype is updated.
crbug.com/1255052: it still hardcodes EncodedAudioChunk durations to be 22ms coded frames due to duration field originally not in EncodedAudioChunk specification, and it checks for EncodedVideoChunk duration in the middle of the Prepare Append steps instead of after that subalgorithm.

wolenetz commented 1 year ago

As mentioned in mozilla/standards-positions#582, "Work on this in Chrome and in spec is currently stalled. We're looking for potential users of this API. If you are aware of users or use cases that could benefit from this work, please share if you can. Otherwise, this spec feature may not progress beyond the current preliminary experimental implementation in Chrome and unmerged spec PR."

dalecurtis commented 8 months ago

As of Chrome 120.0.6074.0+ the prototype API now supports EME. The speculative IDL can be seen at https://github.com/w3c/webcodecs/issues/41#issuecomment-1762368081

Does anyone have opinions on appendEncodedChunks using promises instead of the updateend event mechanism? I'm worried it doesn't mix well with existing players and since remove is still event based both mechanisms are needed. As such, I'm inclined to remove promise support from appendEncodedChunks and defer such support to #100.

hlevring commented 7 months ago

This seems really interesting for a/v synchronized playback without having to containerize webcodec encoded chunks.

Are there any MSE player samples that uses webcodec encoded a/v chunks?

w3c / media-source

Enable buffering of WebCodecs Encoded Chunks for playback with MSE - aka "MSE for WebCodecs" or "MSE4WC" #184