w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
978 stars 136 forks source link

Should WebCodecs use MediaStreamTrack? #199

Closed youennf closed 3 years ago

youennf commented 3 years ago

WebCodec is currently solely supporting callbacks to either generate frames for decoders or receive frame for encoders.

In a typical MSE-like scenario, it might be convenient to directly pipe decoder output to a media element. A MediaStreamTrack would be more handy than using callbacks:

Additionally, it seems that the current algorithm for outputting frames from the decoder is as follows:

Based on this understanding, if the controlling thread is spinning, the frames might stay blocked in the controlling thread task queue, which is a potential memory problem.

It seems that, on main thread, which may block more often than background threads, exposing a MediaStreamTrack instead of callbacks would remove that potential issue.

I am wondering whether the following would be a good pattern to build upon:

padenot commented 3 years ago

See https://github.com/w3c/webcodecs/issues/131 for a relevant discussion.

chcunningham commented 3 years ago

In a typical MSE-like scenario, it might be convenient to directly pipe decoder output to a media element.

Early on we expected this type of convenience would be a much requested feature. But, so far, the majority of folks seem to prefer to render their own frames via WebGL or Canvas. I expect this to be common for MSE-like use cases. If you're already using MSE, but considering WebCodecs, you're likely in the camp of sub-second low-latency streaming, for which having total control of rendering behavior is important.

Having said all that, a similar convenience can be had by using MediaStreamTrackGenerator, who's track can then serve as the src of a media element.

JS wise, this is a very small amount of code. Something like

let generator = new MediaStreamTrackGenerator({kind:'video'});
videoElement.srcObject = new MediaStream([generator]);

let frameWriter = generator.writable.getWriter();
let decoder = new VideoDecoder({
    output: (frame) => { frameWriter.write(frame); }
    error: (error) => { ... }
};

Based on this understanding, if the controlling thread is spinning, the frames might stay blocked in the controlling thread task queue, which is a potential memory problem.

This is true of the above example, but we have not found this to be an issue in practice. The best practice is to use WebCodecs from a DedicatedWorker who's main thread is not doing much else outside of codec I/O.

youennf commented 3 years ago

It seems we both agree that web pages should use WebCodecs from a DedicatedWorker and that the above snippet has a potential memory issue.

My question is then: is there a usecase for WebCodecs from Window that cannot be achieved by WebCodecs in workers?

If there is no such usecase, I am wondering whether it is actually sound to expose WebCodecs in Window environments, at least in its current form. Should I file a separate issue to discuss restricting WebCodecs to workers?

chcunningham commented 3 years ago

My question is then: is there a usecase for WebCodecs from Window that cannot be achieved by WebCodecs in workers?

No. But I can imagine boutique use cases where using a worker is just extra hoops to jump through (e.g. pages that don't have much of a UI or pages that only care to encode/decode a handful of frames).

I don't think this problem is unique to the web. With most (all?) other media libraries, its true that you are allowed to do codec work on your UI thread, even if that is usually not a good idea. My instinct is to give web authors the same powers that are afforded in native libraries and leave it to them to make the call for their use case.

youennf commented 3 years ago

My question is then: is there a usecase for WebCodecs from Window that cannot be achieved by WebCodecs in workers?

No.

I agree.

My instinct is to give web authors the same powers that are afforded in native libraries

A native application has a tight control on what gets executed in its process and its main thread. A web application has no such control:

On the contrary, a worklet or a worker is running code that the web application fully controls. The expectations can be made higher in those environments, and easier to implement as well.

It makes sense to me to be more conservative since we agree there is no loss of functionality.

padenot commented 3 years ago

On the contrary, a worklet or a worker is running code that the web application fully controls. The expectations can be made higher in those environments, and easier to implement as well.

To add to this (I agree with everything said in the comment otherwise) quite a few tricks can be pulled by implementers to optimize things in a way that really matters, for example (here for video, but this is the case for audio already), playing with thread priorities. It's not infrequent to have the compositor thread in a different scheduling class than regular threads (using an abstract term because there is lots of way this is achieved in practice on different systems and versions of those systems).

This is really necessary for audio (it can't work otherwise), but it's really nice for graphics as well on machines on the older/cheaper side, or simply when the system load is high, to ensure a high perceptual quality.

chcunningham commented 3 years ago

Keep in mind that WebCodecs explicitly does not do actual decoding/encoding work on the main (control) thread (be that window or worker). The work is merely queued there and is later performed on a "codec thread" (which is likely many threads in practice).

A native application has a tight control on what gets executed in its process and its main thread.

I don't think this is critical to the use cases I highlighted. My argument is: for folks who do not need lots of codec io, or for which the main thread is doing little else besides codec io, the main thread is adequate. Requiring authors to use workers in this case just adds obstacles.

I agree that some apps don't know what's running on their page. Those apps don't fit the use case I gave.

I agree that the different browsers may implement the main thread scheduling in different ways, but its hard for me to imagine an implementation of main thread scheduling that works on today's web while also being insufficient for the use cases I gave.

I agree apps don't know what extensions are installed. But, similar to my point above, if an extension breaks main thread scheduling for the use cases I cited, it would also break scheduling for the web more generally.

@padenot: I read your comment as just adding background info on threading. I agree that higher thread priorities are critical for compositing and audio rendering. My understanding is a choice to expose WebCodecs on window doesn't limit such designs.

@aboba @padenot: opinions on the core issue? Is it reasonable to expose on Window for use cases where Window's main thread ability is sufficient?

youennf commented 3 years ago

The core issue is about the best way to represent WebCodecs input/output in Window environments. Let's discuss whether it is sound to expose WebCodecs in Window in a separate issue.

youennf commented 3 years ago

Filed https://github.com/w3c/webcodecs/issues/211 specifically for this.

chcunningham commented 3 years ago

Filed #211 specifically for this.

Ack. I've transitioned to that one for discussion of window exposed interfaces.

The core issue is about the best way to represent WebCodecs input/output in Window environments

Has this issue been sufficiently addressed? Do we agree that, as currently defined: (1) the required JS is pretty small and (2) use in workers resolves main thread woes (where applicable)?

youennf commented 3 years ago

I think we should first finish the discussion whether there is a need for WebCodecs in Window environments in the other issue. Once the need is identified, we can discuss the exact shape of the API.

For instance, you talked about low-latency MSE that might want frames access, MediaStreamTrack might not be fine grained enough, which is probably fine. But this use-case is most probably something that should happen in workers, not in window environments.

chcunningham commented 3 years ago

Triage note: marking 'breaking' as the proposal would remove support for chunk/frame based codec APIs in window scope. This is purely a triage. My stance on implementing (opposed) is described in comments and linked issues.

chcunningham commented 3 years ago

My recent comment in #211 proposes to leave WC window-exposure as-is. That issue was filed as a preliminary for this one, so I want to return here to consider the implications of that resolution (if it is accepted).

My summary of this discussion:

Given the new insights in #211, I would amend my position as: Even for window-usage, the main thread is not inherently bad in all use cases, as demonstrated by feedback from developers (representing their end users).

Here, @youennf's opening comment proposes that we "Restrict per-frame processing to worker/worklet environments." The specific argument to "restrict" is again to avoid main-thread perf issues. If we close #211 as proposed, acknowledging that the main thread perf issues are not material to all use cases, it follows that we should also not restrict per-frame processing.

dalecurtis commented 3 years ago

Given the resolution of https://github.com/w3c/webcodecs/issues/240#issuecomment-859242056 and https://github.com/w3c/webcodecs/issues/266#issuecomment-855965875 I think this issue should be closed regardless of window/worker resolution. Since frame dropping isn't allowed in all modes we can't use a MST which allows dropping.

I think #266 covers a value-add case where we might optionally expose a way to get a MediaStreamTrack of VideoFrames for use with WebRTC or <video>, but it doesn't seem reasonable to require folks to go through a MediaStream interface. Especially since that may only exist for video decoding.

dalecurtis commented 3 years ago

Editors call: closing in favor of continued value-add discussion in #266 for generating a MediaStreamTrack from a decoder.

youennf commented 3 years ago

Similarly to a video decoder generating a MediaStreamTrack, this issue is also tracking the possibility for a video encoder to consume a MediaStreamTrack, so it differs from #266. In both cases, the rationale is the same:

  1. This is a pattern that will happen reasonably often
  2. This is a pattern that makes it very easy for browser optimisations: no main-thread jank, UA-based frame dropping in case of CPU overuse...
dalecurtis commented 3 years ago

I think this issue flies a bit too closely to more controversial discussions, so lets use #266 for discussion of optionally extending the interfaces with MediaStreams.