Should WebCodecs be exposed in Window environments?

youennf commented 3 years ago

As a preliminary to https://github.com/w3c/webcodecs/issues/199, let's discuss whether it is useful to expose WebCodecs in Window environment. See the good discussions in https://github.com/w3c/webcodecs/issues/199 for context.

jan-ivar commented 3 years ago

Thanks for the minutes. I'd like to weigh in that I'm concerned this is insufficient to deter usage from the default (main) thread, by the same arguments made that workers are arduous (today).

I think the working group is thinking too narrowly, focusing on the coding/decoding, and not seeing its part in the larger media pipeline it creates, with the potential to guide on what thread this data pipeline ends up being laid.

The larger picture, the significant use case I think should focus our attention, is realtime media streaming. The Holy Grail is to unify media streaming and realtime communication, as the "next version" of WebRTC if you will. We don't want this pipeline on main thread.

Compared to that exciting and ambitious long-term goal, the arguments for default and direct main-thread access now, seem shortsighted and underwhelming to me.

Maybe I'm wrong, but I also sense a rush to form consensus on this important decision (both in general and at the end of the minutes for lack of time). From the overall conversation it seems clear that:

There is consensus and no opposition to exposing WebCodecs in workers
There isn't consensus (yet) and there is opposition to exposing WebCodecs on window

In spite of anecdotal assurances, I think there's a risk here of making an API mistake we won't appreciate until the full picture is ready. It should be uncontroversial that the conservative approach here would be to proceed with 1 — because it has consensus and no concerns over exposure — without blocking on 2. Because we can always do 2 later, but putting 2 back in the bottle may prove hard. People spoiling to use this are early adopters who know how to create workers (sorry).

There is no web compat issue here, so arguments that we must have both seem antithetical to caution and iterative progress, and I'd be a bit disappointed if we didn't first try the limited approach.

chrisn commented 3 years ago

Thanks @jan-ivar. I certainly don’t want to resolve to expose in Window simply because Chrome already has an implementation – it needs to be because that’s an acceptable end state for all.

From the overall conversation it seems clear that:

There is consensus and no opposition to exposing WebCodecs in workers

There isn't consensus (yet) and there is opposition to exposing WebCodecs on window

That's a fair assessment of where we are.

In spite of anecdotal assurances, I think there's a risk here of making an API mistake we won't appreciate until the full picture is ready. It should be uncontroversial that the conservative approach here would be to proceed with 1 — because it has consensus and no concerns over exposure — without blocking on 2

If we think that purely providing developer guidance isn't sufficient, I suggest developing the "holy grail" into a complete proposal so that we have a clearer understanding of what access from Window may look like.

youennf commented 3 years ago

I suggest developing the "holy grail" into a complete proposal so that we have a clearer understanding of what access from Window may look like.

Access from Window should not be an end of itself, but a means towards this 'holy grail'. There seems to be consensus that exposing WebCodecs in workers is a step towards this 'holy grail'.

There are a few things we could try to look at:

Develop a JS shim to expose functionality in Window from API exposed in dedicated workers. This might help the WG measure the level of complexity/convenience we are talking about. This might also help web developers (for prototyping).
Dig in further in MediaStreamTrack integration pros and cons, in terms of authoring and efficiency for instance.
Dig in further in worker support, what seems to be missing (Stadia experiment) or hard to do in early adopter apps (Vyzor).

surma commented 3 years ago

After seeing the TAG review and the continued discussion here, I’d like to re-emphasize a point made by @AshleyScirra:

it's already a real development headache how many APIs are supported only in window and not on worker. It makes it really difficult to write context-agnostic code that can work in either. Please don't compound the problem by starting to add APIs that are supported only in workers and not on window.

Not every app can accommodate workers in their architecture very easily. For some use-cases, they can be a net-negative (in performance or otherwise). Allowing people to try both variants and measure is, in my opinion, crucial, but requires APIs to be available in both contexts.

AshleyScirra commented 3 years ago

If this API is made worker-only, I'd expect the following to play out:

Developer A publishes a small library that implements WebCodecs on the main thread, and just posts calls to a worker, with the aim of improving convenience for non-real-time cases like transcoding.
Developer B comes along, picks up the library, and uses it for something real-time critical on the main thread.

Thus, the API design has not prevented a bad decision. In fact now it's even worse: it's on the main thread and also has added postMessage latency to everything. The only way to mitigate that from API design is by allowing WebCodecs on the main thread, so at least there is no postMessage latency if a developer makes a bad decision.

I'd compare this to the subtle property of the Web Crypto API. Despite having been coding JS for years, I always just assumed it was some mysterious quirk of the API. Only recently, I found out it's meant to be trying to tell developers to be careful because "many of these algorithms have subtle usage requirements". I still think it's a bizarre choice, as it completely failed to convey that to me. Similarly to this, I expect any API design choices meant to guide developers to good decisions will just be ignored or regarded as unexplained weirdness, and then developers will carry on and do what they were going to anyway.

jan-ivar commented 3 years ago

Thus, the API design has not prevented a bad decision

@AshleyScirra I feel I already addressed that that's not the bar. Defaults matter.

Developer B comes along, picks up the library, and uses it for something real-time critical on the main thread. ... Thus, the API design has not prevented a bad decision.

That's fine, because we retain our option to pursue 2 at that point = lib gone. It's the other outcome that cannot be undone.

jan-ivar commented 3 years ago

For some use-cases, they can be a net-negative (in performance or otherwise).

@surma Inherently, or are these short-term hurdles? We have momentum exposing lots of "holy grail" (that's gonna stick isn't it?) sources & sinks to workers: MSE, MST, OffscreenCanvas, RTCDataChannel, RTCRtpScriptTransform, and WebTransport. Unless we've missed something, there shouldn't be a use case that requires main thread.

Allowing people to try both variants and measure is, in my opinion, crucial, but requires APIs to be available in both contexts.

I like measurements, but ahead of decisions, not after. I'm hopeful browsers can add prefs and other things to allow us to continue measuring in experiments. If down the road we learn we were wrong, we expose it on window; If we expose it now and discover we were right, we can't remove it. Sometimes features help drive adoption of things that are harder but better (https).

dalecurtis commented 3 years ago

Many of the replies above continue to overlook a few points that I think bear highlighting:

Codec APIs are not new, they've existed for 20+ years on native platforms. We are simply adapting native capabilities to the web. There's an extensive body of both precedent and usage to learn from there.
- In no case are such APIs limited to certain threads (even on iOS), despite the same UI and GC concerns expressed here.
- There are extensive examples of use-cases which are not real-time or near real-time.
- We shouldn’t succumb to speculation which leads us down a path of not-invented-here syndrome.
Actual codec operations are already specified to happen on a different thread.
- Please remember that what we're actually debating is whether or not we should allow the main thread to function as a control channel. I.e., there is no heavyweight work on either the worker or the main thread.
- There are many main-thread only APIs that developers would want to use with WebCodecs. VideoFrames can be created from any CanvasImageSource.
- No UA other than Chromium has even shipped the APIs (e.g., OffscreenCanvas, transferControlToOffscreen, even ImageBitmap in Safari’s case) that would be necessary to allow WebCodecs to be used in just a worker without still passing messages to window.
Workers are frequently referred to in the above replies as a holy grail with binary benefits. Which simply isn't true. They are a tool with a spectrum of benefits and situational utility.
- On low end devices or those with high contention the distinction between the main thread and worker threads is meaningless.
- Latency in single frame (e.g., images) or low frame rate use cases will be actively hurt by a worker requirement.
When speculating on potential issues, please remember Chrome has concrete feedback from developers (including multiple XXXm MAU clients) using the main thread without issue.
- Large web applications often have extensive telemetry for discovering the type of issues speculated on above.

youennf commented 3 years ago

In no case are such APIs limited to certain threads (even on iOS), despite the same UI and GC concerns expressed here.

VTB iOS decoders are usually outputting frames in specific background threads. For camera output, which is not too far from a decoder, I think the queue is settable: if the queue is busy, frames will be dropped.

About UI and GC concerns, there are differences between web pages and native (or electron) apps: a web page runs in a browser which executes code from several unrelated web pages. Some of these pages may do things like sync XHR.

There are many main-thread only APIs that developers would want to use with WebCodecs. VideoFrames can be created from any CanvasImageSource.

Based on the extensive work you did in implementing the WebCodecs editor's draft and reaching out to developers, I understand you identified gaps in worker API support. This is something we should put energy on as spec authors.

Back to your specific point, IIUC, it is referring to video encoders and CanvasImageSource producing a stream of frames. Typically video elements or canvas, both of which can generate a MediaStreamTrack. One possibility is to transfer the track in a worker and do the work there (some specs are available for that). Another would be to allow piping directly a MediaStreamTrack as input to a video encoder (no spec there). A third possibility would be to use a JS shim to postMessage each frame to a worker while we finalise the shape of the API in Window (spec is ready).

On low end devices or those with high contention the distinction between the main thread and worker threads is meaningless.

I am not exactly sure why this distinction is meaningless on those devices. Can you explain it in more details? For instance, service workers and audio worklets are widely deployed technologies running in background threads. I am hoping these two technologies are running fine on low end devices.

Even if this is meaningless, is it actually harmful on these low end devices to use a worker? It seems you agree that on high end devices, the distinction is meaningful. I would hope they could use the same code (and achieve optimal perfs) for both types of device.

Latency in single frame (e.g., images) or low frame rate use cases will be actively hurt by a worker requirement.

Single frame equals ImageDecoder I believe. I agree we should have a separate discussion for this one.

About the low frame rate case, let's say data is received from a RTCDataChannel. Let's transfer the RTCDataChannel to a worker. The RTCDataChannel data is now feeding a video decoder directly in the worker. Video decoder frames are processed directly in the worker, let's say rendered directly in a canvas that was transferred to the worker using transferControlToOffscreen. How is it actually hurting to use a worker?

Even at low frame rate, users do like regular frame rates (WebRTC stats has totalinterframedelay for instance to measure this). Using a worker provides additional guarantees to maintain regular frame rate.

chrisn commented 3 years ago

@youennf said:

Single frame equals ImageDecoder I believe. I agree we should have a separate discussion for this one.

It had occurred to me that ImageDecoder has different considerations so should be discussed separately. Are there concerns specifically about exposing ImageDecoder on Window?

chcunningham commented 3 years ago

Based on the extensive work you did in implementing the WebCodecs editor's draft and reaching out to developers, I understand you identified gaps in worker API support. This is something we should put energy on as spec authors.

We support this work, but it is not a short term hurdle (as @jan-ivar suggested above). For ex, OffscreenCanvas goes back at least to 2015. It took Chrome 3 more years to actually ship it and today support is still missing in most UAs. It follows that enabling worker support for other APIs (where the work is just now starting) forces developers to carry technical debt they described here for many years to come.

Back to your specific point, IIUC, it is referring to video encoders and CanvasImageSource producing a stream of frames.

This is not the intended reference. A longer list of main-thread only APIs is described in the comment above. It includes both encoding and decoding uses. I would prefer that we design solutions for those gaps in separate issues.

A third possibility would be to use a JS shim to postMessage each frame to a worker while we finalise the shape of the API in Window (spec is ready).

I believe this implies exposing IO interfaces (eg. VideoFrame, AudioData, and Encoded*Chunk) to Window. Just to check: are there no objections to that?

On low end devices or those with high contention the distinction between the main thread and worker threads is meaningless.

I am not exactly sure why this distinction is meaningless on those devices. Can you explain it in more details?

Low end devices will have fewer cores, so there is less (and often zero) opportunity to actually parallelize worker and main thread tasks.

Even if this is meaningless, is it actually harmful on these low end devices to use a worker?

We don't think they are harmful to this case. We wanted to highlight that they are not inherently helpful, weighing in on the discussion between @jernoble and @koush above.

youennf commented 3 years ago

We support this work, but it is not a short term hurdle (as @jan-ivar suggested above). For ex, OffscreenCanvas goes back at least to 2015.

Let's clearly separate implementations and specs. With regards to Jan-Ivar's list, I believe specs are in good shape. With regards to implementations, Chrome seems close to supporting most of these APIs in workers so is in a good position to expose WebCodecs in workers with most bells and whistles already available.

A longer list of main-thread only APIs is described in [the comment above]

Looking at the list, RTCDataChannel is now exposed to workers. MediaStreamTrack as well. There is a path forward for WebAudio (there is a PR to expose MediaStream in workers). Input events are something that probably need to be tackled somewhere, is there a GitHub issue tracking this? About Canvas, what is the missing support? Are we tracking this somewhere?

I believe this implies exposing IO interfaces (eg. VideoFrame, AudioData, and Encoded*Chunk) to Window. Just to check: are there no objections to that?

Right, I think it is fine to expose VideoFrame in Window. Image decoder needs it and at first sight, it sounds ok to expose it in window as well. It would be good to have VideoFrame be transferable.

youennf commented 3 years ago

It would be good to have VideoFrame be transferable.

I see this is tracked in https://github.com/w3c/webcodecs/issues/210

jan-ivar commented 3 years ago

there is no heavyweight work on either the worker or the main thread.

@dalecurtis If we look at encode/decode in isolation maybe. But even then, MSE in workers has some relevant comments. There's also @surma's excellent talk on how the "main-thread is completely unreliable".

But in practice, as @youennf pointed out, if we zoom out (no pun intended) it's incorrect to consider this a "control thread" only, when it's where the API surfaces data. Use cases like live face-tracking and background replacement are spoiling to run on main thread if we allow it.

OffscreenCanvas ... took Chrome 3 more years ... support is still missing in most UAs.

@chcunningham OffscreenCanvas is behind a pref gfx.offscreencanvas.enabled in Firefox, and tracked in bug 1390089 if it helps.

dalecurtis commented 3 years ago

@youennf wrote:

About UI and GC concerns, there are differences between web pages and native (or electron) apps: a web page runs in a browser which executes code from several unrelated web pages. Some of these pages may do things like sync XHR.

While there may be some differences of severity, a native application operating in a near real-time mode will have many of the same concerns as a web page.

On low end devices or those with high contention the distinction between the main thread and worker threads is meaningless.

I am not exactly sure why this distinction is meaningless on those devices. Can you explain it in more details? For instance, service workers and audio worklets are widely deployed technologies running in background threads.

As @chcunningham mentioned above the distinction between threads of the same priority blurs as contention increases. Without switching to a worklet model the amount of priority boosting we can do is equivalent between window and worker.

Even if this is meaningless, is it actually harmful on these low end devices to use a worker? It seems you agree that on high end devices, the distinction is meaningful. I would hope they could use the same code (and achieve optimal perfs) for both types of device.

Latency in single frame (e.g., images) or low frame rate use cases will be actively hurt by a worker requirement.

Sorry, I meant to write 'high frame rate' in echo of my previous comment. I do agree there are meaningful use cases.

Any required hops between threads will add latency; in single frame and high frame rate cases this time will dominate. All single-frame use cases (not just ImageDecoder, VideoDecoder may be used here too: https://github.com/w3c/webcodecs/issues/205#issuecomment-843412730 -- consider also poster frames, keyframe previews, frame-step, etc) will suffer and there will be even more harm for UAs which don't support OffscreenCanvas and transferControlToOffscreen.

@jan-ivar wrote:

@dalecurtis If we look at encode/decode in isolation maybe. But even then, MSE in workers has some relevant comments.

We don't disagree that real-time or near real-time use cases can benefit from a worker thread, where we're disagreeing is whether it's appropriate to restrict all use cases just because developers might not use a worker for near real-time ones.

There's also @surma's excellent talk on how the "main-thread is completely unreliable".

It's a great talk. Given that @surma is participating in this thread, we should consider his comments appropriately.

But in practice, as @youennf pointed out, if we zoom out (no pun intended) it's incorrect to consider this a "control thread" only, when it's where the API surfaces data. Use cases like live face-tracking and background replacement are spoiling to run on main thread if we allow it.

I don't think what I wrote disagrees with you, "there is no heavyweight work on either the worker or the main thread" being the most salient point here. For these specific use cases we are actively pursuing the MediaStreamTrackProcessor interface, none of the OT participants for these APIs are using it on the main thread.

dalecurtis commented 3 years ago

I'd like to reiterate why @chcunningham wants to issue a call for consensus on this issue. After extensive good faith discussions, the arguments for window exposure are effectively "there are non real-time use cases'' and the arguments against are "developers may use the API wrong for near real time use cases, so let's wait and see."

The argument against window exposure doesn't afford a future where we would re-enable window exposure. Any instance of a developer doing the wrong thing could be used against window exposure. So the argument against just becomes an indeterminate function of time. The use cases won't emerge if the APIs aren't there, aren't performant, or are too annoying to use. More developers will simply choose native applications where these capabilities are already present.

The argument for window exposure has extensive precedent in the native application space and concrete OT data indicating developers prefer the ability to use the APIs on both window and worker. TAG has recommended deferring to developer feedback.

As such, barring any new arguments or consensus at the upcoming meeting we feel we should move the matter to vote.

jan-ivar commented 3 years ago

Any required hops between threads will add latency ... We don't disagree that real-time or near real-time use cases can benefit from a worker thread, where we're disagreeing is whether it's appropriate to restrict all use cases just because developers might not use a worker for near real-time ones.

@dalecurtis Which non-realtime, non-near-realtime use cases require low latency?

For these specific use cases we are actively pursuing the MediaStreamTrackProcessor interface, none of the OT participants for these APIs are using it on the main thread.

That's great to hear, since we're also trying to limit that API to workers in https://github.com/w3c/mediacapture-transform/issues/23.

The argument against window exposure doesn't afford a future where we would re-enable window exposure.

The argument for window exposure doesn't afford a future where we could disable window exposure, seems more correct...

The use cases won't emerge if the APIs aren't there, aren't performant, or are too annoying to use.

I'll note this is the opposite extreme of an earlier claim that the API design won't have "prevented a bad decision" because: "

Developer A publishes a small library that implements WebCodecs on the main thread, and just posts calls to a worker, with the aim of improving convenience for non-real-time cases like transcoding.
Developer B comes along, picks up the library, and uses it for something real-time critical on the main thread."

I was hoping reactions would land closer to the latter, which we could measure, and re-enable window exposure over.

dalecurtis commented 3 years ago

@dalecurtis Which non-realtime, non-near-realtime use cases require low latency?

All of those which have user interactivity or user visibility. Specifically I was referring to the single frame use cases I mentioned in the preceding paragraph. Even for non-realtime cases we still want to minimize user visible latency.

That's great to hear, since we're also trying to limit that API to workers in w3c/mediacapture-transform#23.

To be clear, current participants are transferring the MSTP Stream from window to worker since neither MST nor MS is worker exposed. Exposing MST/MS in a worker would resolve their needs.

The argument for window exposure doesn't afford a future where we could disable window exposure, seems more correct...

My point is that our argument is between the concrete and the speculative. The arguments against aren't specific enough to ever allow window exposure. I.e., what new information do you think we'll have in 1-2 years that we don't have now?

I was hoping reactions would land closer to the latter, which we could measure, and re-enable window exposure over.

Thanks, this is constructive. So you hope we'd measure how much developer pain has occurred over some time frame as a justification for window exposure. I think this will be hard to measure. I.e., it seems likely to suffer from selection bias. Can you provide some details on how you'd expect this to work?

youennf commented 3 years ago

Without switching to a worklet model the amount of priority boosting we can do is equivalent between window and worker.

Main thread JS does plenty of things: handle events, navigations, analytics... JS executed in a worker usually does only one thing. This fine-grained granularity helps for scheduling.

Any required hops between threads will add latency; in single frame and high frame rate cases this time will dominate.

This seems something worth measuring.

More developers will simply choose native applications where these capabilities are already present.

Understood. Note though that OS provided codecs, like VideoToolbox, often require native applications to deal with threads/background queues.

TAG has recommended deferring to developer feedback.

It is not clear to me the TAG landed on a precise recommendation yet.

Can you provide some details on how you'd expect this to work?

If we do not expose to Window at launch time, we will probably provide enough API to build a JS shim. Based on that, we could:

Measure the efficiency impact of using that JS shim vs. built-in support, learn from that and optimise support where needed. The idea is to avoid premature optimisation pitfalls.
Evaluate the JS shim in terms of code complexity metrics. Is such a shim straightforward to write or not?
Look at the applications that will actually use the JS shim and learn from them in terms of which additional APIs we might want to add to the platform.

surma commented 3 years ago

Any required hops between threads will add latency; in single frame and high frame rate cases this time will dominate. This seems something worth measuring.

Not sure if this is 100% applicable to the conversation at hand, but I did do some research in this area:

1.) I analyzed the overhead of postMessage, where the outcome can be summarized with: structured serialize/structured deserialize can cause long frames even on a modern MacBook Pro if the payload is over 100KB (details in the blog post).

macbook-chrome 39a91bb5

2.) As a very latency-sensitive use-case, I ported a WebXR app to use workers for the physics calculations. I needed to cut corners (i.e. sending ArrayBuffers instead of regular JS objects at a significant cost of ergonomics) to keep the app running smoothly-ish on an Oculus Quest at 72fps.

1.) shows that depending on the data that needs to be sent back and forth, using workers can add significant overhead. 2.) shows that while there are workarounds (ArrayBuffers and transferables), they come at an ergonomics cost and are less easy to reason about.

I am not saying that moving to a worker wasn’t worth it in the WebXR case, but I do think this indicates that it is hard to say up-front for any given use-case whether workerization is going to be a net-positive or net-negative. This leads me to believe that it would be better to allow developers to experiment with their app off-main-thread and on-main-thread and make the decision whether to use workers or not on a per use-case basis.

youennf commented 3 years ago

Interesting @surma. It leads me to think that using serializable/transferable for the JS shim would be greatly useful to get good performance.

In the cases where latency is important, my hope is that the bulk of the data, in particular video frames, could stay in the worker, allowing web developers to limit postMessage use to small-sized control messages.

Your WebXR example points out that integration of WebXR with say MediaStreamTrack or WebCodecs is something worth investigating as well.

dalecurtis commented 3 years ago

In the interest of bearing out statements above and helping folks put their response together on the CFC, I put together a demo using WebCodecs from Window to Worker: https://github.com/dalecurtis/webcodecs-shim https://storage.googleapis.com/dalecurtis/wcshim/worker-test.html

WebCodecs VideoDecoder test page. Requires running Chrome with --enable-blink-features=WebCodecs or running with "Experimental Web Platform Features" enabled in chrome://flags. Decodes 5000 frames in a window or worker.

Set 'worker=true' to use a worker Set 'window=true' to use a window Set 'codec=h264|av1|vp9' to select the codec Set 'hw=deny|require' to force hardware or software codec, default is allow Set 'busy=true' to add a sine-shaped busy wait of up to 50ms on the main thread (make sure to keep the window in the foreground when running tests).```

E.g.,

I'll post the actual results data later, but for now the demo bears out many of the statements above:

Workers are more performant under main thread contention.
Window based decoding is more than performant enough (>> 60fps) for low latency scenarios under contention.
First frame latency generally increases by 4-5x with a worker. It may increase by orders of magnitude in the uncached case.
Workers require a message passing system to be designed which can be annoying and ends up being ~140 lines for an minimum-viable decoder. I didn't try to add AudioDecoder, AudioEncoder, or VideoEncoder, but they should be similar.
Accessing static accessors like decodeQueueSize, state, and encodeQueueSize must be independently tracked or some lag in the values managed by the page.

Surprising results:

Workers have lower per-frame latency even when uncontended (perhaps due to DOM updates test makes though).

dalecurtis commented 3 years ago

Here's the raw data for a windows desktop, AMD A4 Chromebook, Android Pixel 3, and a macbook pro M1. Notes:

In the uncontended case, workers are slower than window by a ~2% margin.
- On mobile / low core count devices, window usage may be up to ~23% faster than workers.
In the contended case (modeled 50ms main thread blockage), workers are faster than window by a ~9% margin.
Workers result in ~20% slower time to first frame. With an average of 71ms for workers and 59ms for window. On cold load the worker may take hundreds of milliseconds (not shown in the data, ensure you refresh when testing workers to avoid this).
Workers result in ~5% lower average frame latency.
No case fell below 30fps in latency and none below 120fps in throughput. All cases could maintain 60fps with a 2-3 frame buffer.
M1 is exceptionally fast at decoding, kudos to our Apple friends. Even in the contended case it's easily delivering 60fps in latency and throughput.

Please look through the data and let me know if you see anything else interesting.

dalecurtis commented 3 years ago

I've added peak memory usage for a few devices to the raw data sheet:

Workers result in ~2.67x the peak memory usage of a window only solution. With an average of ~51mb for window usage and ~135mb for worker usage.

The result is a combination of three factors:

Baseline memory usage is simply higher when a worker is used.
The queue size can't be accurately controlled from the window.
Transfer time ensures frames stay alive a longer.

kdashg commented 3 years ago

there is no heavyweight work on either the worker or the main thread.

@dalecurtis If we look at encode/decode in isolation maybe. But even then, MSE in workers has some relevant comments. There's also @surma's excellent talk on how the "main-thread is completely unreliable".

But in practice, as @youennf pointed out, if we zoom out (no pun intended) it's incorrect to consider this a "control thread" only, when it's where the API surfaces data. Use cases like live face-tracking and background replacement are spoiling to run on main thread if we allow it.

OffscreenCanvas ... took Chrome 3 more years ... support is still missing in most UAs.

@chcunningham OffscreenCanvas is behind a pref gfx.offscreencanvas.enabled in Firefox, and tracked in bug 1390089 if it helps.

Unfortunately you should consider OffscreenCanvas as unimplemented in Firefox for the time being.

dalecurtis commented 3 years ago

Since this came up on the CFC: A pure worker (i.e., everything from the worker) performs the same as the 'uncontended main thread case' even under contention modulo the same startup latency issues as the 'worker shim' case and using slightly more memory to spawn the worker. Apologies for not calling that out more explicitly above.

aboba commented 3 years ago

@dalecurtis @chcunningham @jan-ivar

While it is great to have data, I don't think it's likely to settle the argument, because the issue isn't really about WebCodecs (or even mediacapture-transform). It's really about processing of VideoFrames such as might occur in various machine learning algorithms.

I see why that could be a problem if the processing were carried out on main-thread. But APIs for ML are being developed elsewhere, not in MEDIA WG (or even in WEBRTC WG).

For a games developer who is using MSE on main thread today (since workers aren't widely supported yet), what is the argument against use of WebCodecs decoder on main thread? Is the reason why WebCodecs decode is considered more dangerous purely because it outputs a VideoFrame, even if it only done in order to render (via Canvas, MediaStreamTrackGenerator, WebGPU, WebGL, etc.)?

Applications won't adopt WebCodecs decode unless there is a verifiable performance improvement compared to the alternatives (which run on main thread). Same with WebCodecs encode - to be used, it has to perform better than alternatives such as WASM. Are we envisaging a nightmare scenario where WebCodecs is simultaneously widely adopted while also performing worse than existing alternatives? Every year, there are a few major league hitters who both get a lot of at bats and compile dismal batting average/on base percentage/slugging statistics. It's pretty rare though and when it does happen, often the player has signed an overpriced contract with the Seattle Mariners.

brandonocasey commented 3 years ago

I am going to chime in and say that this API should be exposed on window because:

Web Worker's add overhead and latency so low latency situations were the main thread isn't doing other things will be slower using a web worker.
It will be much easier to learn and test with the API without using a web worker, even if the ultimate plan is to use a web worker for your setup. Setting up an example with a web worker and running though the code in a debugger is somewhat painful.
Debugging outside of a web worker is much easier. We mock our web worker on the main thread so that we can more closely follow calls. If the API isn't exposed on window, debugging will have to be done in a web worker.
Most other technical or media oriented API's of this type are exposed on window, so it would only be consistent to do so here. EX: Web Assembly, Web Audio API, MediaSource, MediaStream.

dalecurtis commented 3 years ago

@brandonocasey There's a call for consensus outstanding at https://lists.w3.org/Archives/Public/public-media-wg/2021Jun/0004.html - if you're speaking on behalf of Brightcove / VideoJS, I recommend posting your reply there via one of your registered media wg members @gkatsev or @gesinger to ensure your voice is heard. The CFC runs until tomorrow 7/2.

gkatsev commented 3 years ago

@dalecurtis thanks, I'll post there some time today.

jan-ivar commented 3 years ago

More data is available in the Mozilla position statement from tests @padenot undertook.

jan-ivar commented 3 years ago

@jernoble To address something raised in the meeting today, that I wish I had commented on: yes there's a control queue, so JS may implement jitter buffers. But a jitter buffer is a poor strategy against main-thread jank, because the user can out-scroll it (the right strategy is a worker, to never drop framerate). It's not what jitter buffers are for.

dalecurtis commented 3 years ago

That's just playing with semantics. Jitter in this case is late encode/decode calls introduced by main thread contention and compares nicely with jitter in packet delivery due to network conditions.

What exactly are you worried about in that example? Even with transferControlToOffscreen() the offscreen canvas will be "out-scrolled" when rendering since scrolling will take priority over drawing/layout updates.

jan-ivar commented 3 years ago

@dalecurtis I wasn't speaking to a specific example, but I don't think limitations of transferControlToOffscreen() generalize to other sources. I merely wanted to clarify that the presence of jitter buffers (that JS may or may not implement well) was taken into account in our assessment above. Our main area of concern is realtime media, where jitter buffers will be shallow.

Without dismissing short-term concerns, we're looking for "an acceptable end state for all", which means looking for use cases that aren't improved by moving to workers, and also aren't temporary, and whose performance using a polyfill isn't satisfactory.

I don't feel we've seen that specific subset of use cases yet, but appreciate that they may surface over time once APIs have shipped, which is why we're open to reconsidering Window exposure in the future. I disagree that we have to decide now.

chrisn commented 3 years ago

With chair hat off:

which means looking for use cases that aren't improved by moving to workers, and also aren't temporary, and whose performance using a polyfill isn't satisfactory

While not a use case, I want to point out this reply to the CfC, which mentions CPU and memory resource constrained device environments: https://lists.w3.org/Archives/Public/public-media-wg/2021Jul/0002.html. A use case my organization is interested in with such devices is decode, composition, and rendering.

jan-ivar commented 3 years ago

@chrisn Thanks for reminding me of that one. We'd be interested in hearing more about this environment, which appears to stand at the far opposite end of the spectrum of mainstream user agents that spend one process per cross-origin iframe.

This 2015 article "How fast are web workers?" focuses on latency & creation time more than cpu & memory overhead. It suggests "to create as few web workers as possible and reuse them".

Its benchmarks may sound concerning (80 kB/ms), but it's worth noting they were done on a $170 (in 2014) Firefox OS Flame (I have one at the bottom of my tech junk drawer somewhere, where it belongs — it was pretty slow and unusable even by 2014 standards). It's specifications were:

Dual-core 1.2GHz CPU
854x480 resolution
adjustable RAM (256MB-1GB)
8 GB storage space

This was also early days for web workers, so hopefully things have improved since then, but it was the first article I found.

dalecurtis commented 3 years ago

Since folks against window exposure feel there is no performance cost to a shim, are there any objections to just letting the spec state that window exposure is optional?

surma commented 3 years ago

let the spec state that window exposure is optional

The conversation has gone on for quite a while and I won’t pretend that I am still on top of every argument that has been made. But making it optional is not really a win from the developer’s perspective, because the progressive enhancement story gets (even) more complicated. It will be hard to tell if a browser just doesn’t have this API on the main thread or no support at all.

jan-ivar commented 3 years ago

That would be a failure to standardize.

Of course there's a performance cost to a shim. That's part of the incentive to move to workers. The cost is where the jank is.

As I mentioned earlier, anything less would be insufficient to deter usage from the default (main) thread, and an incentive clearly seems needed. Chrome's position is "that main-thread jank is not material to all use cases", so the shim is for those.

We see this as critical to begin exposing realtime media to JS responsibly, and also as the best API surface. An API that helps inform what to do and how to do it right, is worth a thousand footgun warnings in documentation.

surma commented 3 years ago

Just to be clear: I am in favor of exposing this API on the main thread. I don’t think the platform should be a helicopter parent for developers to the extent of locking away APIs. It implies that we, the standards authors, can claim to have thought through every possible current (and future) use case for this API and that none of them have a valid reason for this API to be used on the main thread.

Again, I find @AshleyScirra’s argument extremely compelling:

it's already a real development headache how many APIs are supported only in window and not on worker. It makes it really difficult to write context-agnostic code

dalecurtis commented 3 years ago

Thanks, I agree :) I asked in the spirit of compromise, since indefinite obstruction isn't productive.

Which leads to my next line of reasoning. Who are these standards for? Are they just for name-brand browsers? Or are they for the entire ecosystem of devices which access "the web?" I.e., from set-top-boxes all the way up to desktop-class UAs running on the most powerful of processors. Our WG has heard from many folks (both inside and outside the group) beyond the name brand browsers. The answer here has implications for what type of restrictions are reasonable for our WG to consider.

aboba commented 3 years ago

@dalecurtis I would hope that the standards are usable across the entire ecosystem. More and more we are seeing inexpensive devices growing in prominence, whether it is Chromebooks for K-12 or endpoints used for "cloud gaming". In such situations, less can be more - complexity leads to interoperability issues which in turn increases the amount of code (and corner cases). We already expect that WebCodecs will not be universally available on all browsers and devices so that applications will need to potentialliy fallback to WASM encode/decode. Balkanizing WebCodecs implementations (by making window support optional) would just add to the complexity.

Personally, I prefer carrots over sticks. If the goal is to enable media applications to use workers, then committing to worker support across the board (for media APIs such as MSE as well as related APIs such as RTCDataChannel) would make sense. Media application developers tend to do the right thing when given appropriate guidance, documentation and code samples.

jan-ivar commented 3 years ago

... locking away APIs. It implies that we, the standards authors, can claim to have thought through every possible current (and future) use case for this API and that none of them have a valid reason for this API to be used on the main thread.

Wanting to defer a decision doesn't imply that. We've stated we're open to revisit this down the line. Why rush this?

Balkanizing WebCodecs implementations (by making window support optional) would just add to the complexity.

Agreed.

Personally, I prefer carrots over sticks. If the goal is to enable media applications to use workers, then committing to worker support across the board (for media APIs such as MSE as well as related APIs such as RTCDataChannel) would make sense.

Isn't exposing on Window the real threat to a commitment to workers? WebCodecs is the carrot. It's a powerful media API.

Media application developers tend to do the right thing when given appropriate guidance, documentation and code samples.

https://webrtc.org/getting-started/unified-plan-transition-guide

AshleyScirra commented 3 years ago

I don't want to sound like too much of a broken record, but I do think developer experience has been an underrated consideration in this discussion. To put it plainly, in terms of developer experience, making WebCodecs worker-only would be the web platform shooting itself in the foot. If you have a library that for whatever reason uses both some DOM-only API and WebCodecs, it becomes, per the spec, impossible to write a single piece of code that can use both. What about if a compatible implementation is developed for node.js? Should that be limited to workers only as well? Perhaps it will be decided that's unnecessary in that context. So then you have another node vs. browser environment compatibility difference. Why not just spin up a worker in the library to use WebCodecs? Well if you're already in a worker, you don't have to (and doing so anyway makes things strictly worse as it adds unnecessary postMessage latency & memory overhead). But now the library has to deal with two different modes. But what if the worker is already contended with lots of work? It should spin up another worker anyway - but the library doesn't know that on its own, the developer still has to make a decision about it. And on and on. At each step developers will be thinking "sheesh this is a pain, why are there all these hurdles here?"

So in my view the problem with the "let's just wait and see" argument is it's clearly such a bad thing for developer experience that it shouldn't be being considered this seriously. The chorus of dismayed developers should lend some gravity to that. The fact there are non-realtime use cases (e.g. transcoding) ought to be grounds enough to expose WebCodecs on window, for the same reason there are both realtime and non-realtime use cases for WebSockets, and those are exposed on window too, and that is entirely appropriate.

jan-ivar commented 3 years ago

Media application developers tend to do the right thing when given appropriate guidance, documentation and code samples.

"sheesh this is a pain, why are there all these hurdles here?"

The above two descriptions of Media application developers appear in opposition. I think the latter is correct.

Yes, media libraries are going to have to think about threading. Yes, the developer experience is going to be harder with workers than with exposure on main thread. The problem is none of those arguments seem contained to non-realtime use cases. Instead, they highlight the path of least resistance.

This makes me more concerned, not less, that if we expose to both main thread and worker, then some realtime applications may never be written correctly (on a worker). And a chorus of end users will blame individual browsers for the sub-par experience.

Even well-read and well-guided media application developers have bosses. Designing an app's media threading model to use workers, may not be something a few such devs will succeed at pushing for on their own (because of short-term costs). We have an opportunity to help them push for doing it right, and help end-users have better experiences, by making the right option the default option. This is what we're here to do.

chrisn commented 3 years ago

One of the main arguments I've heard for deferring decision is a lack of use cases that require Window exposure. As somebody in the discussion said, it's likely any use case could be made to work in Worker context - even if that means transferring data between Window and Worker. This would mean that a decision to defer at this time leaves us without good criteria to later decide to expose in Window. This makes me concerned that a decision to defer becomes a decision we cannot re-evaluate later. What new information would you be looking for?

koush commented 3 years ago

One of the main arguments I've heard for deferring decision is a lack of use cases that require Window exposure.

As a consumer of WebCodecs, if it becomes worker only, I'll need to write a shim to expose it to window (or pass the data to worker), because WebUSB (the data source) is only available in window.

dalecurtis commented 3 years ago

+1 to @chrisn. I was just typing out the same point.

@jan-ivar it's hard to reconcile:

Wanting to defer a decision doesn't imply that. We've stated we're open to revisit this down the line. Why rush this?

Given the copious amount of feedback you've received from both inside and outside the work group indicating a worker restriction is problematic for performance and developer experience reasons. Combined with statements like:

We see this as critical to begin exposing realtime media to JS responsibly, and also as the best API surface. An API that helps inform what to do and how to do it right, is worth a thousand footgun warnings in documentation.

We are unsure what criteria would ever satisfy you. I.e., you seem hyper-focused on real-time use cases to a point that precludes discussion of other use cases and the performance costs of the shim (~2.68x memory usage for a toy example; 51mb -> 135mb!). Can you provide any criteria which would ever change your mind?

Even well-read and well-guided media application developers have bosses. Designing an app's media threading model to use workers, may not be something a few such devs will succeed at pushing for on their own (because of short-term costs). We have an opportunity to help them push for doing it right, and help end-users have better experiences, by making the right option the default option. This is what we're here to do.

Why is this outcome any more likely than those same developers just using a shim and resulting in an even worse experience?

jan-ivar commented 3 years ago

This makes me concerned that a decision to defer becomes a decision we cannot re-evaluate later. What new information would you be looking for?

I believe the decision to reevaluate rests with the chairs. A deferral (unlike a no decision) might not technically even require new information to reopen (but check that). Would people feel better if we scheduled to revisit it, say a year from now?

A year from now, I'd expect there would be production sites to look at and measure, and even more widespread support for media sources and sinks in workers across browsers. If we find key use cases that are hurting, we can weigh the pros and cons of exposure then. We'll be in a better position to decide at that time than now.

In contrast, if we expose to main thread now, and a year from now we find this was a mistake, we won't be able to change it.

jan-ivar commented 3 years ago

As a consumer of WebCodecs, if it becomes worker only, I'll need to write a shim to expose it to window (or pass the data to worker), because WebUSB (the data source) is only available in window.

Mozilla considers WebUSB harmful, so this use case is not compelling to us.

w3c / webcodecs

Should WebCodecs be exposed in Window environments? #211