Closed chcunningham closed 4 months ago
This proposal makes sense for realtime streams where the only goal is minimizing rendering latency. That's the case for many desktop-streaming cases, but in higher-latency case (including regular video conferencing) a jitter buffer may be preferable. Once you get to regular media playback it does not make sense to work this way.
This is most similar to the bitmaprenderer
canvas context, although that has an extra async createImageBitmap()
step and therefore could in theory render out-of-order in extreme cases:
let ctx = canvas.getContext("bitmaprenderer");
function output(frame) {
createImageBitmap(frame).then(bitmap => ctx.transferFromImageBitmap(bitmap));
frame.close();
}
A 2d
context can also draw a VideoFrame
without much code:
let ctx = canvas.getContext("2d");
function output(frame) {
ctx.drawImage(frame, 0, 0);
frame.close();
}
A canvas
may not be the ideal output path for minimizing latency, it might make sense to use a video
element instead. That quickly starts to look like the MediaStreamTrackGenerator
API which also supports audio. The direct approach proposed originally does have the advantage of knowing for sure that the frame won't be exposed to JS though, which could allow for extra optimizations.
@youennf mentioned using media stream tracks as well.
https://github.com/w3c/webcodecs/issues/211#issuecomment-854511645
This makes me a bit wary because media streams and video don’t have low latency guarantees, and may buffer the data, etc.
media streams and video don’t have low latency guarantees, and may buffer the data
This is accurate, the buffering properties of a MediaStreamTrack are not clear. It was designed to support WebRTC so it favors realtime use, though it does include the ability to jitter buffer. The Chrome implementation is realtime, except when used with MediaRecorder.
There is difficulty inherent in integrating MediaStreamTrack with WebCodecs, since WebCodecs supports both realtime and non-realtime uses. MediaStreamTrackGenerator adds Streams to the mix, which is an extra layer where frames can be buffered.
All that said, I would typically expect the buffers to be empty and the latency to be low. The goal is, after all, for MediaStreamTrackGenerator to be widely used by WebRTC apps.
This makes me a bit wary because media streams and video don’t have low latency guarantees, and may buffer the data
I do not think MediaStreamTracks do any buffering in general (it might be worth checking what browsers do by creating a canvas capture track, generating a frame on this track and then assigning the track to a video element without generating new frames). MediaStreamTrack is basically allowing JS to setup a pipe between a source to a sink. If buffering happens, it will be at the source (say WebCodec decoder) or at the sink (say HTMLVideoElement). Given the proposals to expose MediaStreamTrack frame access level, buffering and low latency behaviours will become much clearer.
As of jitter buffer, this might be done on raw video frames, which might be memory intensive. It can also be done on compressed data in which case a MediaStreamTrack is just fine. No need for video frame access except for more advanced cases like tight synchronization with other data sources.
There is difficulty inherent in integrating MediaStreamTrack with WebCodecs, since WebCodecs supports both realtime and non-realtime uses
I am not really sure of that. Can you describe why that would be difficult? In any case, it could be an output option as proposed in https://github.com/w3c/webcodecs/issues/199.
I do not think MediaStreamTracks do any buffering in general (it might be worth checking what browsers do by creating a canvas capture track, generating a frame on this track and then assigning the track to a video element without generating new frames).
Is this just a convenient byproduct of the implementation or a guarantee by the spec?
Is this just a convenient byproduct of the implementation or a guarantee by the spec?
I think this is the spirit of the spec, especially given its first use is exposing realtime audio data like camera and microphone. And I would expect consistency in the various implementations (@guidou, @jan-ivar, any thoughts?)
There is not a lot of normative wording given this is not super observable by the web application. One example:
There is difficulty inherent in integrating MediaStreamTrack with WebCodecs, since WebCodecs supports both realtime and non-realtime uses
I am not really sure of that. Can you describe why that would be difficult?
MediaStreamTrack is designed for realtime use, it will drop stale frames when new frames arrive. Apps are never guaranteed to receive all frames from a source, which is unsuitable for non-realtime uses.
Adding Streams, as MediaStreamTrackProcessor does, adds a buffer that operates after stale frame dropping occurs. The result can be that a stale frame is kept alive inside the Stream for a long time while newer frames are dropped. In this case the drop behavior isn't even keeping the "next frame" fresh.
Yeah MediaStreamTrack is always realtime AFAIK. Maybe you're confusing it with MSE?
Adding Streams, ... adds a buffer
Can you elaborate? My understanding of WHATWG streams is that they impose no buffering on their own. That is: streams can operate by passing (platform) objects around, and the default high-water mark is 1, which translates to no buffering.
it will drop stale frames when new frames arrive
That is currently not observable to the web so it really depends on the source and sink internal behaviors.
On another WebCodec threads, we are discussing the possibility to add a 'realtime' codec setting. If we have such setting, and we have a video WebCodec decoder as a MediaStreamTrack source, we could define the source behaviour as follows:
I don't think we could ever replace all the WebGL code in the first post. At best we could replace a canvas.drawImage() call, but the WebGL stuff has meaning. For these straight-to-canvas use cases, it seems like we instead want to use one of the proposed MediaStream creation mechanisms and pipe them into video:
We could accept either of those in the VideoDecoderInit
in place of the output callback as an extension.
I do not see how MediaStreamTrackGenerator or MediaStreamTrackVideoController give benefits here over a MediaStreamTrack, can you clarify this? The intent in general is that WebCodec decoder output would flow directly from WebCodec decoding thread to the rendering thread without ever hitting WebCodec controlling thread.
As a side bonus, if we design native MediaStreamTrack transforms, we could have WebCodecs -> MediaStreamTrack -> transform -> rendering without any JS interruption.
MediaStreamTrack could also work; I just suggested the JS versions offhand. E.g., one possibility for optional interfaces like this:
MediaStreamTrack (Audio|Video)Decoder.getMediaStreamTrack()
void (Audio|Video)Encoder.configure(MediaStreamTrack)
Keep in mind MediaStreams run afoul of the same issues that streams do though, so we'll want to be careful here: https://docs.google.com/document/d/10S-p3Ob5snRMjBqpBf5oWn6eYij1vos7cujHoOCCCAw/edit
In general though, we're supportive of optional MediaStreamTrack integration points. I agree they could be very useful for simplifying developers lives and improving performance by decoupling entirely from JS visible threads.
For MediaStreamTrack
integration points to be usable in workers, it will also be necessary to support transfer of MediaStreamTrack
s. Did you mean getMediaStream()
or getMediaStreamTrack()
in the interface above?
Thanks, I did mean MediaStreamTrack
, since you can always construct a MediaStream
from an array of tracks. I've updated the method signature in my comment to match the return value. Ultimately I defer to the MediaStream
experts here though.
@Djuffin @padenot Can we close this?
In issue #211, koush@ raised the following idea for an option to send outputs directly to canvas (no callback). Splitting that off into its own issue for further consideration.
koush@ wrote: