w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
975 stars 137 forks source link

Add WebCodecs in Worker sample #583

Closed aboba closed 1 year ago

aboba commented 1 year ago

This is a sample demonstrating WebCodecs Encode and Decode in a worker. A live demo is here: https://webrtc.internaut.com/wc/wcWorker2/

Partial fix for https://github.com/w3c/webcodecs/issues/78

aboba commented 1 year ago

Submitted PR https://github.com/w3c/webcodecs/pull/586 to add a link on the samples page.

tidoust commented 1 year ago

The encoding/decoding time measurements (enc_update, dec_update) may confuse people looking at the sample code (at least they confused me initially ;)), because they don't actually measure encoding/decoding times but rather the times taken by the calls to the encode and decode functions, which should be negligible most of the time since the functions basically just enqueue a control message to encode/decode the frame and return. I wonder whether they could be dropped. Or are these measurements meant to highlight something?

aboba commented 1 year ago

@tidoust The sample is collecting statistics on call times to potentially reproduce blocking behavior which had been reported. However, even with full-hd resolution and AV1 encoding, so far the call times remain low on all the (desktop, notebook) devices I have tested. If that remains the case, the call time statistics can be removed. I'd still like to keep the encode and decode queue metrics.

In this sample, glass-glass latency remains low even with the highest resolutions and most complex encoders. However, measuring encoding/decoding times might still be useful, even if only to provide a baseline for comparison when other pipeline stages are added. For example, in a WebTransport API sample, we have added sending and receiving to the pipelines. When this is done the glass-glass latency increases a lot, and it would be helpful to be able to break down the contributors to this.

One way to measure the encode and decode times might be to correlate VideoFrames and encodedChunks by timestamp. If you have any other suggestions, let me know.

tidoust commented 1 year ago

I don't disagree that it is useful to measure transformation steps. I wonder whether it is a good idea to add a sample targeted at devs that includes some measures but not really the measures that one might expect at first sight. I'm just slightly worried that readers may think that the encoding/decoding actually takes place synchronously when the code calls encode and decode since that is what gets measured.

A sample without measurement would still be very useful for devs looking into combining WebCodecs with MediaStreamTrackProcessor and MediaStreamTrackGenerator to create a processing/transport pipeline in a worker.

I'm happy to look into adding measurements of the actual encoding/decoding steps based on the VideoFrame/chunk timestamp, although I may need a bit of time...

Just in case, I'm all fine with merging this sample as is. It can be improved over time (it will need to be updated to use VideoTrackGenerator instead of MediaStreamTrackGenerator at some point in any case ;))

aboba commented 1 year ago

@tidoust Yes, I could see where the time metrics might be misinterpreted. Given that the time metrics haven't proven particularly instructive, I have removed them. This should make the sample easier to understand.

I am working on another sample for the WebTransport WG which adds network transport to the pipeline. In that sample, glass-glass latency is considerably higher. So I'd like to continue the discussion on metrics in that (forthcoming) PR.

tidoust commented 1 year ago

Thanks @aboba!

Digging deeper into the code (I'm sorry that my feedback comes in a piecemeal fashion, it takes time to absorb what happens under the hoods), I'm wondering about backpressure. The DecodeVideoStream and EncodeVideoStream transform streams don't communicate backpressure signals at all in this sample. On the encoding side, the code handles backpressure itself by maintaining its own pending queue (pending_outputs) and by dropping frames when the queue is full.

Given that the sample creates a stream pipeline with transformation steps, I'm wondering whether internal queues and backpressure signals that come with streams should rather be used.

Typically, the transform function of EncodeVideoStream could return a Promise that gets resolved when the output function of the VideoEncoder is called with the encoded frame (and similarly for the decoder), as a way to propagate the backpressure to the MediaStreamTrackProcessor and the getUserMedia source. Or is that a bad idea? In other words, do we usually want to drop frames that cannot be processed in time or to ask the source not to generate them?

I realize that you probably wrote that example to monitor the queue lengths of the VideoEncoder and VideoDecoder, which this approach makes impossible (there would be one frame at most in the encoder queue at any time). I'm again more trying to look at it from the eyes of a developer willing to combine WebCodecs and streams and wondering how to best do that.

aboba commented 1 year ago

In the sample, the maximum resolution provided to the encoder is set in the UI and framerate is determined by the track settings and neither are subsequently adjusted. Similarly, the target average bitrate is provided to the encoder and is also not adjusted.

Should the encoder be overloaded, the mechanism for "letting off steam" is to stop submitting additional tasks to the encoder. The sample stops submitting additional work to the encoder once pending_outputs exceeds 30. I suspect that the encodeQueueSize could also have been used in a similar way.

However, based on the queue metrics, the decoder queue seems more likely to grow > 0 than the encoder queue. Even when encoding full_hd with AV1 at 30 fps and setting the average target encoder bitrate to 2 Mbps, neither encodeQueueSize nor decodeQueueSize seem to go above 3 or 4, even on mediocre hardware. Based on this, it seems unlikely that pending_outputs ever gets close to 30.

However, it is common for the decoder queue to go above 1, but even when it does, the glass-glass latency remains very low. Given that the encoder and decoder queues doesn't seem to build regardless of resolution and target bitrate, I didn't see value in implementing WHATWG Streams backpressure, since this would not have automatically propagated back through MediaStreamTrackProcessor to the track source.

I think this may be an architectural issue worth discussing, relating to use of WHATWG Streams in media processing.

dalecurtis commented 1 year ago

Don't forget about the dequeue events if you want to monitor queue size.

dalecurtis commented 1 year ago

Do any reviewers have further comments here? We'll plan to merge tomorrow otherwise.