w3c / media-source

Media Source Extensions
https://w3c.github.io/media-source/
Other
267 stars 59 forks source link

Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" #21

Open wolenetz opened 8 years ago

wolenetz commented 8 years ago

Migrated from w3c bugzilla tracker. For history prior to migration, please see: https://www.w3.org/Bugs/Public/show_bug.cgi?id=28379

It was previously assigned to Adrian Bateman. Editors will sync soon to determine who to take this bug.

wolenetz commented 8 years ago

It sounds like a set/get low latency API might solve this.

jdsmith3000 commented 8 years ago

This issue requests app control over the latency model, and that's clearly a new feature request. It might be possible to detect a live stream and set lower latency buffering, but it's not clear that would be the best thing to do on all live streams. An API that lets the app communicate intent is likely needed to resolve this adequately.

On V.Next already.

paulbrucecotton commented 8 years ago

The Media Task Force has agreed to designate this issue as V.Next: https://lists.w3.org/Archives/Public/public-html-media/2015Nov/0027.html

greentorus commented 7 years ago

Feature proposals/"requests":

a) The low latency mode should also support video streams (e.g. H.264) with an initial single key frame followed by P-frames only. Because having key-frames from time to time means significantly larger packets from time to time which take longer time to transmit and therefore arrive later at the client, which are no issue for buffered VOD situations, but cause stuttering in case of low-latency situations with close to zero buffering. Having only P-frames looses seeking, but low-latency use-cases like video chat or cloud gaming does not need that anyway.

b) The low-latency mode should work well with adding each new video frame individually to the source buffer. Because adding multiple video frames together to the source buffer would introduce an unnecessary buffering and therefore increase the delay.

jyavenard commented 7 years ago

What you want to do, and the type of video data you use (a single starting keyframe, followed by P-frame), is currently fundamentally incompatible with the sourcebuffer architecture and spirit.

MSE requires regularly spaced keyframes to work, in particular in order to be able to evict data from the sourcebuffer. The concept of dealing with individual frames would have to be removed, and allowing to evict data using a binary offset only.

An alternative would be to sourcebuffer::remove to take either a percentage, or a byte offset. seeking would have to be disallowed. the live seekable attribute would always return an empty range.

greentorus commented 7 years ago

Yes, I see the point that it is for now fundamentally incompatible with the current MSE philosophy. But what I have in mind is: Low-latency MSE is a very interesting feature for many applications, and as this issue shows we are not the first ones being interested into that ;-) And single-keyframe video streams are one important aspect for good low-latency I think. So extending the MSE architecture to make that possible would be useful and worth it. Maybe there are very simple approaches, simpler than percentages or byte offsets: For example (as mentioned in the mozilla board), low-latency use-cases are usually personalized and interactive and therefore don't need seeking anyway. So one simple solution could be that seeking and sourcebuffer::remove is officially simply not possible (returning an error) if the video has only one keyframe (so far).

andrewmd5 commented 5 years ago

Have there been any updates on this or a real live low latency mode for MSE vNext?

wolenetz commented 5 years ago

Not tangible, though I have discussed some approaches face-to-face with @jyavenard earlier this year.

wolenetz commented 5 years ago

@greentorus / https://github.com/w3c/media-source/issues/21#issuecomment-236976892: It sounds like you're requesting a different feature (though for live low latency as goal): seeking and sourcebuffer::remove (and background video suspension, and video track de/re-selection) would need to be constrained to not involve reconfiguring the decoder, because the implementation would be unable to pre-roll from an ancient (and likely no longer buffered) keyframe to satisfy those scenarios. Have you considered using the MediaStream API to satisfy those constraints without involving major change to MSE buffering/GC (nor HTMLMediaElement extension) behavior?

I propose we keep this issue (renamed and refocused) to be more like what #133 wants (an explicit MSE API to set/get the implementation's low vs "smoothing" latency model. Please file a separate issue if the "single keyframe plus lots of P frames" scenario is not a better fit for the MediaStream API than a vNext MSE API.

mmmmichael commented 5 years ago

@greentorus / #21 (comment): It sounds like you're requesting a different feature (though for live low latency as goal): seeking and sourcebuffer::remove (and background video suspension, and video track de/re-selection) would need to be constrained to not involve reconfiguring the decoder, because the implementation would be unable to pre-roll from an ancient (and likely no longer buffered) keyframe to satisfy those scenarios. Have you considered using the MediaStream API to satisfy those constraints without involving major change to MSE buffering/GC (nor HTMLMediaElement extension) behavior?

Yes, we are also considering the MediaStream/WebRTC API.

However, compared to MSE, MediaStream/WebRTC involves a lot of unnecessary high-level complexity and protocol restrictions for only displaying a live video stream.

Also, as a minor secondary reason, it seems the MSE video pipeline is better optimized for higher resolution in many browser implementations. For example, the MSE implementation in Firefox under Windows seems to use hardware decoding based on the Windows Media Foundation, but its MediaStream implementation seems to use software-only decoding.

I propose we keep this issue (renamed and refocused) to be more like what #133 wants (an explicit MSE API to set/get the implementation's low vs "smoothing" latency model. Please file a separate issue if the "single keyframe plus lots of P frames" scenario is not a better fit for the MediaStream API than a vNext MSE API.

We don't care about what the solution is, as long as it provides low-latency. So an explicit latency model sounds good.

However, real low-latency seems to be not possible without "single keyframe plus a lots of P frames".

For example, suppose the user has a 20 Mbps network connection. This supports a 2160p 60fps video stream, typically with 1 keyframe per second. Depending on the scenario, in many situations a keyframe often consumes up to 1/2 of that total bandwidth of even more (in this case around 10 Mb), while the P frames are very small (around 150 kb). This is no problem when using high-latency buffering. But it means that transferring a keyframe takes 1/2 second. This means the minimum possible latency is also 1/2 second. When only using P frames, the minimum possible latency is 1/2 of 1/60 = 1/120 second. Note that decreasing the number of keyframes per second decreases bandwidth, but doesn't decrease latency, which will stay at 1/2 second. The only exception seems to be: Not sending any keyframes anymore after a single initial one. Then, after some initialization hickup, the minimum latency is 1/120 latency.

This problem was the reason why we started experimenting with a "single keyframe plus a lots of P frames".

How could this problem be avoided in the "low vs 'smoothing' latency model" proposal when still having regular keyframes?

fernando-80 commented 5 years ago

How could this problem be avoided in the "low vs 'smoothing' latency model" proposal when still having regular keyframes?

That's a good point! This sounds to me like a transport layer issue, rather than the proposed "low latency model". From my understanding of the feature need and the discussion, low latency model is going to conceptually disable the MSE receiver jitter buffer, in a way that frames are rendered as they come. The handling of any artifacts, and data loss whose consequence is loss of playback"smoothness" is by definition abstracted to outside the MSE, perhaps to the application/system layer. Getting back to the comparison between webRTC and MSE with "low latency model" as I see - webRTC is a complete (but not flexible) solution that transports video to one of the peers, which just needs a simple video html5 element to render it. The MSE with "low latency model" should be suitable for applications needing further refined control over the transport, smoothness, video formats, enhancements etc.