Add optional MediaStreamTrack integration points to encoders and decoders.

chcunningham commented 3 years ago

In issue #211, koush@ raised the following idea for an option to send outputs directly to canvas (no callback). Splitting that off into its own issue for further consideration.

koush@ wrote:

Incidentally, some of this discussion would be moot if WebCodecs had an option to just let implementers change this:

to this:

No callback, no message posts. Just an to API render straight to the canvas (which is what we want 99% of the time with a decoder), and the browser can figure out the best way to do that.

sandersdan commented 3 years ago

This proposal makes sense for realtime streams where the only goal is minimizing rendering latency. That's the case for many desktop-streaming cases, but in higher-latency case (including regular video conferencing) a jitter buffer may be preferable. Once you get to regular media playback it does not make sense to work this way.

This is most similar to the bitmaprenderer canvas context, although that has an extra async createImageBitmap() step and therefore could in theory render out-of-order in extreme cases:

let ctx = canvas.getContext("bitmaprenderer");

function output(frame) {
    createImageBitmap(frame).then(bitmap => ctx.transferFromImageBitmap(bitmap));
    frame.close();
}

A 2d context can also draw a VideoFrame without much code:

let ctx = canvas.getContext("2d");

function output(frame) {
    ctx.drawImage(frame, 0, 0);
    frame.close();
}

A canvas may not be the ideal output path for minimizing latency, it might make sense to use a video element instead. That quickly starts to look like the MediaStreamTrackGenerator API which also supports audio. The direct approach proposed originally does have the advantage of knowing for sure that the frame won't be exposed to JS though, which could allow for extra optimizations.

koush commented 3 years ago

@youennf mentioned using media stream tracks as well.

https://github.com/w3c/webcodecs/issues/211#issuecomment-854511645

This makes me a bit wary because media streams and video don’t have low latency guarantees, and may buffer the data, etc.

sandersdan commented 3 years ago

media streams and video don’t have low latency guarantees, and may buffer the data

This is accurate, the buffering properties of a MediaStreamTrack are not clear. It was designed to support WebRTC so it favors realtime use, though it does include the ability to jitter buffer. The Chrome implementation is realtime, except when used with MediaRecorder.

There is difficulty inherent in integrating MediaStreamTrack with WebCodecs, since WebCodecs supports both realtime and non-realtime uses. MediaStreamTrackGenerator adds Streams to the mix, which is an extra layer where frames can be buffered.

All that said, I would typically expect the buffers to be empty and the latency to be low. The goal is, after all, for MediaStreamTrackGenerator to be widely used by WebRTC apps.

youennf commented 3 years ago

This makes me a bit wary because media streams and video don’t have low latency guarantees, and may buffer the data

I do not think MediaStreamTracks do any buffering in general (it might be worth checking what browsers do by creating a canvas capture track, generating a frame on this track and then assigning the track to a video element without generating new frames). MediaStreamTrack is basically allowing JS to setup a pipe between a source to a sink. If buffering happens, it will be at the source (say WebCodec decoder) or at the sink (say HTMLVideoElement). Given the proposals to expose MediaStreamTrack frame access level, buffering and low latency behaviours will become much clearer.

As of jitter buffer, this might be done on raw video frames, which might be memory intensive. It can also be done on compressed data in which case a MediaStreamTrack is just fine. No need for video frame access except for more advanced cases like tight synchronization with other data sources.

There is difficulty inherent in integrating MediaStreamTrack with WebCodecs, since WebCodecs supports both realtime and non-realtime uses

I am not really sure of that. Can you describe why that would be difficult? In any case, it could be an output option as proposed in https://github.com/w3c/webcodecs/issues/199.

koush commented 3 years ago

I do not think MediaStreamTracks do any buffering in general (it might be worth checking what browsers do by creating a canvas capture track, generating a frame on this track and then assigning the track to a video element without generating new frames).

Is this just a convenient byproduct of the implementation or a guarantee by the spec?

youennf commented 3 years ago

Is this just a convenient byproduct of the implementation or a guarantee by the spec?

I think this is the spirit of the spec, especially given its first use is exposing realtime audio data like camera and microphone. And I would expect consistency in the various implementations (@guidou, @jan-ivar, any thoughts?)

There is not a lot of normative wording given this is not super observable by the web application. One example:

The User Agent MUST always play the current data from the MediaStream and MUST NOT buffer.

sandersdan commented 3 years ago

There is difficulty inherent in integrating MediaStreamTrack with WebCodecs, since WebCodecs supports both realtime and non-realtime uses

I am not really sure of that. Can you describe why that would be difficult?

MediaStreamTrack is designed for realtime use, it will drop stale frames when new frames arrive. Apps are never guaranteed to receive all frames from a source, which is unsuitable for non-realtime uses.

Adding Streams, as MediaStreamTrackProcessor does, adds a buffer that operates after stale frame dropping occurs. The result can be that a stale frame is kept alive inside the Stream for a long time while newer frames are dropped. In this case the drop behavior isn't even keeping the "next frame" fresh.

jan-ivar commented 3 years ago

Yeah MediaStreamTrack is always realtime AFAIK. Maybe you're confusing it with MSE?

Adding Streams, ... adds a buffer

Can you elaborate? My understanding of WHATWG streams is that they impose no buffering on their own. That is: streams can operate by passing (platform) objects around, and the default high-water mark is 1, which translates to no buffering.

youennf commented 3 years ago

it will drop stale frames when new frames arrive

That is currently not observable to the web so it really depends on the source and sink internal behaviors.

On another WebCodec threads, we are discussing the possibility to add a 'realtime' codec setting. If we have such setting, and we have a video WebCodec decoder as a MediaStreamTrack source, we could define the source behaviour as follows:

If WebCodec decoder is 'realtime', generate frames as fast as possible. If sinks cannot keep up, drop outdated frames.
If WebCodec decoder is not 'realtime', generate a frame once all the track sinks finished processing the last frame. New frame decoding would be paused/resumed based on the backpressure mechanism @guidou and I discussed elsewhere (which would be the first actual use of backpressure for tracks). This behavior can be defined and used whether using WhatWG streams or not to expose individual frames from a MediaStreamTrack.

dalecurtis commented 3 years ago

I don't think we could ever replace all the WebGL code in the first post. At best we could replace a canvas.drawImage() call, but the WebGL stuff has meaning. For these straight-to-canvas use cases, it seems like we instead want to use one of the proposed MediaStream creation mechanisms and pipe them into video:

MediaStreamTrackGenerator: https://w3c.github.io/mediacapture-transform/
MediaStreamTrackVideoController: https://github.com/youennf/mediacapture-extensions/blob/main/video-raw-transform.md

We could accept either of those in the VideoDecoderInit in place of the output callback as an extension.

youennf commented 3 years ago

I do not see how MediaStreamTrackGenerator or MediaStreamTrackVideoController give benefits here over a MediaStreamTrack, can you clarify this? The intent in general is that WebCodec decoder output would flow directly from WebCodec decoding thread to the rendering thread without ever hitting WebCodec controlling thread.

As a side bonus, if we design native MediaStreamTrack transforms, we could have WebCodecs -> MediaStreamTrack -> transform -> rendering without any JS interruption.

dalecurtis commented 3 years ago

MediaStreamTrack could also work; I just suggested the JS versions offhand. E.g., one possibility for optional interfaces like this:

MediaStreamTrack (Audio|Video)Decoder.getMediaStreamTrack() void (Audio|Video)Encoder.configure(MediaStreamTrack)

Keep in mind MediaStreams run afoul of the same issues that streams do though, so we'll want to be careful here: https://docs.google.com/document/d/10S-p3Ob5snRMjBqpBf5oWn6eYij1vos7cujHoOCCCAw/edit

In general though, we're supportive of optional MediaStreamTrack integration points. I agree they could be very useful for simplifying developers lives and improving performance by decoupling entirely from JS visible threads.

aboba commented 3 years ago

For MediaStreamTrack integration points to be usable in workers, it will also be necessary to support transfer of MediaStreamTracks. Did you mean getMediaStream() or getMediaStreamTrack() in the interface above?

dalecurtis commented 3 years ago

Thanks, I did mean MediaStreamTrack, since you can always construct a MediaStream from an array of tracks. I've updated the method signature in my comment to match the return value. Ultimately I defer to the MediaStream experts here though.

aboba commented 4 months ago

@Djuffin @padenot Can we close this?

w3c / webcodecs

Add optional MediaStreamTrack integration points to encoders and decoders. #266