w3c / mediacapture-transform

MediaStreamTrack Insertable Media Processing using Streams
https://w3c.github.io/mediacapture-transform/
Other
44 stars 19 forks source link

Out-of-main-thread processing by default #23

Open youennf opened 3 years ago

youennf commented 3 years ago

Current API allows getting access to raw media in main thread which has known issues in terms of robustness and implementation. We should envision an API that does processing by default where it is safe to do so. A few options come to mind:

guidou commented 3 years ago

The option of using w3c/mediacapture-extensions#16 also would allow getting access on the main thread. Also, I don't think allowing access on the main thread is such a dangerous thing that should be forbidden at all costs, especially if the result is a much more complex API. After all, today what is being used is canvas with media access on the main thread, and it works in production for many applications. About the AudioWorklet as the only solution, I need to study the pros and cons, but some feedback I have seen from developers is that timestamping from the AudioWorklet makes AV sync more difficult, for example (i.e., the timestamps are not from the microphone, but from the worklet). I need to confirm if this is an implementation bug or WAI, but it's something some developers are reporting. And, of course, WebCodecs integration is more difficult with WebCodecs.

jan-ivar commented 3 years ago

The option of using w3c/mediacapture-extensions#16 also would allow getting access on the main thread.

It'd let us limit new MST access methods to workers, so users have to transfer them to a worker to do such processing.

I don't think allowing access on the main thread is such a dangerous thing that should be forbidden at all costs,

I think it's generally understood that the main-thread is overworked and underpaid. It's a particularly poor environment for realtime (low-latency) processing due to the open-ended number of tasks that may be queued on it at certain times, which can lead to ms delays even to get on it. This is why "The main-thread is completely unpredictable". Newer devices have increasingly tighter FPS deadlines to meet, so we shouldn't design something that cannot meet it reliably.

especially if the result is a much more complex API

I don't see how it would be. Worst-case, streams can be transferred from workers to main-thread, no worse than what was proposed the other direction.

guidou commented 3 years ago

The option of using w3c/mediacapture-extensions#16 also would allow getting access on the main thread.

It'd let us limit new MST access methods to workers, so users have to transfer them to a worker to do such processing.

True. But is the any precedent for such an API design?. In general, I think the approach for APIs is to expose on both the main thread and the worker and let application developers decide based on the requirements of their applications.

I don't think allowing access on the main thread is such a dangerous thing that should be forbidden at all costs,

I think it's generally understood that the main-thread is overworked and underpaid. It's a particularly poor environment for realtime (low-latency) processing due to the open-ended number of tasks that may be queued on it at certain times, which can lead to ms delays even to get on it. This is why "The main-thread is completely unpredictable". Newer devices have increasingly tighter FPS deadlines to meet, so we shouldn't design something that cannot meet it reliably.

Yes. And developers know that, so they are in a good position to weigh the tradeoffs. I don't think we should make the decision for them. Moreover, the reality today is that many different types of applications, from simple examples to industrial-strength VC systems are using the main thread today to do this type of processing and it works.

especially if the result is a much more complex API

I don't see how it would be. Worst-case, streams can be transferred from workers to main-thread, no worse than what was proposed the other direction.

I agree that transferring tracks would not result in a particularly complex API. My comment was before we had decided to move in that direction.

youennf commented 3 years ago

But is the any precedent for such an API design?

Audio worklet is one example. In general, I think that worklets should be the default for those potentially perf sensitive operations. Worker is really a tradeoff here. It is also worth keeping in mind that nothing prevents extending support to window environments in the future, supporting window environments on day 1 does not seem mandatory.

guidou commented 3 years ago

But is the any precedent for such an API design?

Audio worklet is one example.

I was referring specifically to APIs exposed on DedicatedWorker, but not on Window.

In general, I think that worklets should be the default for those potentially perf sensitive operations. Worker is really a tradeoff here.

How would a worklet support the use cases that require generating a new track in JS? In particular, the use case where you want to generate a new track based on the contents of two (or more) existing tracks (i.e., the weather report use case).

It is also worth keeping in mind that nothing prevents extending support to window environments in the future, supporting window environments on day 1 does not seem mandatory.

Nothing is mandatory, but going against how all Web APIs are exposed requires in my opinion a stronger argument than the ones I've seen so far. The existence of examples ranging from simple demos to critical enterprise products using the main thread shows that it's technically viable to do it. It's true that a lot of those cases would benefit from moving the processing off the main thread, but I think that's a tradeoff developers should be able to decide.

jan-ivar commented 3 years ago

Audio worklet is one example.

I think this is a perfect example, and a strong corollary to video:

There's clear and relevant precedent here for controlling exposure of realtime data away from main thread, possibly even away from workers.

I was referring specifically to APIs exposed on DedicatedWorker, but not on Window.

@guidou Is the argument we shouldn't expose methods to workers without also exposing them to main-thread, because that would be unexpected somehow? What's the worry there exactly? If symmetry is of concern, we can do VideoWorklet. ๐Ÿ˜‰

How would a worklet support the use cases that require generating a new track in JS? In particular, the use case where you want to generate a new track based on the contents of two (or more) existing tracks (i.e., the weather report use case).

It can be done with audio today, so I don't think whether it is possible or not is the discussion.

Instead, worker vs. worklet I think comes down to what benefits there may be from controlling the environment of exposure.

I don't know what those are atm, but if we plan to expose VideoFrames from a GPU buffer pool https://github.com/w3c/webcodecs/issues/83 we might wish we had some control, so JS failing to close them quickly doesn't stall camera capture or WebGL in the browser.

guidou commented 3 years ago

Audio worklet is one example.

I think this is a perfect example, and a strong corollary to video:

  • ScriptProcessorNode exposed audio data on the main-thread
  • Everyone agreed this was a terrible mistake, and deprecated it, slating it for removal.
  • Interesting trivia: Its immediate replacement was initially going to be "Audio Workers"
  • But the Audio WG found this improvement insufficient, further isolating exposure to a highly controlled AudioWorklet environment.

There's clear and relevant precedent here for controlling exposure of realtime data away from main thread, possibly even away from workers.

I was referring specifically to APIs exposed on DedicatedWorker, but not on Window.

@guidou Is the argument we shouldn't expose methods to workers without also exposing them to main-thread, because that would be unexpected somehow? What's the worry there exactly? If symmetry is of concern, we can do VideoWorklet. ๐Ÿ˜‰

I'm saying that going against the established pattern of exposing in both places requires a stronger argument than "the user might do something wrong in certain circumstances" which is the only argument given so far. This could be said basically about any API, including the audio worklet. The comparison with ScriptProcessorNode is meaningless since it could only run only on the main thread and was intended for applications that might require a real-time thread. Neither of those two things are true for the streams-based proposal. If you think the way to go is a VideoWorklet, then I'm very interested in seeing more details about that proposal. My impression so far is that introducing a VideoWorklet would be more complex than streams, since streams are already specified, implemented, and proven in production. The benefits of a video workler are unclear to me, but a more concrete proposal might make them clearer. An advantage worklets can provide is the ability to run on any thread, including real-time threads, but that is not needed for our intended use cases.

How would a worklet support the use cases that require generating a new track in JS? In particular, the use case where you want to generate a new track based on the contents of two (or more) existing tracks (i.e., the weather report use case).

It can be done with audio today, so I don't think whether it is possible or not is the discussion.

I was asking just in case you had something concrete in mind.

Instead, worker vs. worklet I think comes down to what benefits there may be from controlling the environment of exposure.

I'm not very familiar with the history of the audio worklet, but I think a stronger argument than controlled exposure is the ability to run on a real-time thread, which you can't do with workers. A real-time thread is ideal for very low-latency, relatively low-CPU applications, which is very common for audio applications that need to render audio locally. It's not needed for the use cases we intend to support (it would be a negative in some cases).

I don't know what those are atm, but if we plan to expose VideoFrames from a GPU buffer pool w3c/webcodecs#83 we might wish we had some control, so JS failing to close them quickly doesn't stall camera capture or WebGL in the browser.

We have control. The application can run processing on a worker if the main-thread is a concern. What other controls do you envision?

jan-ivar commented 3 years ago

I'm saying that going against the established pattern of exposing in both places requires a stronger argument than "the user might do something wrong in certain circumstances" which is the only argument given so far.

I'm hearing no reason other than pattern, and I suspect those patterns are largely historical. I.e. workers are still a relatively young concept compared to the size of the platform, so that might explain why many APIs were exposed to main thread first. WebIDL lets us specify exposure discretely, and I see no established rationale to back up why exposure must follow any pattern. If you have one, please point me to it, otherwise I see no reason to follow a pattern not backed by rationale.

Also, AudioWorklet is a clear example that breaks that pattern: we had main thread exposure, and removed it on purpose for good reasons, which I gave. That this ended up being a worklet rather than a worker, I don't think had anything to do with any rule that APIs cannot be exposed in workers only. I've also seen no proposals to transfer ScriptProcessorNode.

I also disagree that being conservative requires arguments. I think being cautious with these low-level APIs is the right approach, which means requiring arguments to be anything but conservative. It'll be hard to put the genie back in the bottle.

An advantage worklets can provide is the ability to run on any thread, including real-time threads, but that is not needed for our intended use cases.

What use case is that? "Funny hats" and "Virtual Reality Gaming" both seem real-time to me.

guidou commented 3 years ago

I'm saying that going against the established pattern of exposing in both places requires a stronger argument than "the user might do something wrong in certain circumstances" which is the only argument given so far.

I'm hearing no reason other than pattern, and I suspect those patterns are largely historical. I.e. workers are still a relatively young concept compared to the size of the platform, so that might explain why many APIs were exposed to main thread first. WebIDL lets us specify exposure discretely, and I see no established rationale to back up why exposure must follow any pattern. If you have one, please point me to it, otherwise I see no reason to follow a pattern not backed by rationale.

The argument is that there are use cases where allowing the main thread is justified IMO:

I don't think ignoring these legitimate use cases because the main thread is bad in other situations is enough justification.

Also, AudioWorklet is a clear example that breaks that pattern: we had main thread exposure, and removed it on purpose for good reasons, which I gave. That this ended up being a worklet rather than a worker, I don't think had anything to do with any rule that APIs cannot be exposed in workers only. I've also seen no proposals to transfer ScriptProcessorNode.

I also disagree that being conservative requires arguments. I think being cautious with these low-level APIs is the right approach, which means requiring arguments to be anything but conservative. It'll be hard to put the genie back in the bottle.

My point is that there are some legitimate use cases that we shouldn't ignore and the historical precedent backs supporting those use cases. Now, what is actually the issue? the concern originally raised by this spec issue is main thread by default, but we're mainly discussing allowing or not allowing running on the main thread, which is a different discussion.

We can address the concern of main thread by default with an extra parameter in the constructor (e.g., allowStreamOnWindow) that is false by default. Any attempt to use the streams on Window would fail unless the application explicitly passes this parameter as true, therefore disabling processing on main thread by default. Would this address the concern in your opinion?

An advantage worklets can provide is the ability to run on any thread, including real-time threads, but that is not needed for our intended use cases.

What use case is that? "Funny hats" and "Virtual Reality Gaming" both seem real-time to me.

The fact that it is a "real-time" application does not mean a real-time priority thread at the OS level is a good idea. In fact, it can be a way to shoot yourself in the foot. If you do processing that requires a lot of CPU on a real-time priority thread you run the risk of starving everything else, including UI and other things running on the main thread.

youennf commented 3 years ago
  • Applications migrating from canvas that have all their logic on the main thread and seek a gradual migration (we've seen this in among some participants in our origin trial)

My guess is that applications complexity mostly lies in the processing of the data, not in how to get the data: WebGL shaders, WASM libraries... I think the requirement is to make sure such processing are feasible and easy to deploy in the environments we are envisioning.

  • Any application handling low loads including demos and toys, but also some production applications.

Demos should show the preferred way of using the API. Toys are product applications. In general, this is a difficult choice and it highly depends on the application and the device running the application. Say a device with a buffer pool of 3 frames, application doing 10fps and GC triggering random hundred millisecond freezes.

As an example, ScriptProcessorNode is probably fine for some applications on some devices.

In fact, it can be a way to shoot yourself in the foot. If you do processing that requires a lot of CPU on a real-time priority thread you run the risk of starving everything else, including UI and other things running on the main thread.

Agreed and that is precisely the point: isolate potentially perf sensitive tasks in their own context to let the UA do the prioritisation, potentially dynamically.

guidou commented 3 years ago
  • Applications migrating from canvas that have all their logic on the main thread and seek a gradual migration (we've seen this in among some participants in our origin trial)

My guess is that applications complexity mostly lies in the processing of the data, not in how to get the data: WebGL shaders, WASM libraries...

My guess is that In a complex codebase with a lot of interconnected layers, introducing off-main-thread processing can be a challenging architecture change, and having the possibility of a migration in multiple stages can be very important for such a project. For example, they could first migrate the main-thread logic from canvas to the new API and evaluate the correctness of that change, in a second stage they could better isolate the part of the processing that will move off the main thread, and in a third stage actually they could move the logic off the main thread. Allowing reasonably easy main-thread usage, even if it's not the default is important in this kind of scenarios, which exist in the real world.

I think the requirement is to make sure such processing are feasible and easy to deploy in the environments we are envisioning.

  • Any application handling low loads including demos and toys, but also some production applications.

Demos should show the preferred way of using the API. Toys are product applications. In general, this is a difficult choice and it highly depends on the application and the device running the application. Say a device with a buffer pool of 3 frames, application doing 10fps and GC triggering random hundred millisecond freezes.

As an example, ScriptProcessorNode is probably fine for some applications on some devices.

In fact, it can be a way to shoot yourself in the foot. If you do processing that requires a lot of CPU on a real-time priority thread you run the risk of starving everything else, including UI and other things running on the main thread.

Agreed and that is precisely the point: isolate potentially perf sensitive tasks in their own context to let the UA do the prioritisation, potentially dynamically.

I presented a proposal to have main-thread processing disabled by default, but to make it easily available if the application explicitly needs it. Do you think it addresses the concern originally raised by this spec issue, even if you disagree with the general approach taken by the spec?

jan-ivar commented 3 years ago

@guidou In my experience (Plan-B, getDisplayMedia etc.) migration is never helped, only hurt, by making it optional.

Those same arguments ("challenging architecture change") are reasons not to support main-thread access, because sites may otherwise never move away from main thread, leaving us in an undesirable place where sites suck and users blame browsers.

I believe good API design makes desirable things easy and undesirable things hard, leading users down the right path.

guidou commented 3 years ago

@guidou In my experience (Plan-B, getDisplayMedia etc.) migration is never helped, only hurt, by making it optional.

Those same arguments ("challenging architecture change") are reasons not to support main-thread access, because sites may otherwise never move away from main thread, leaving us in an undesirable place where sites suck and users blame browsers.

I believe good API design makes desirable things easy and undesirable things hard, leading users down the right path.

Unless we plan to remove canvas capture, etc., migration will always be optional. The way I see it, our options are to make it optional and easy, or optional and difficult. I think optional and easy is more likely to result in migration than optional and difficult.

jan-ivar commented 3 years ago

If people were happy with canvas capture we wouldn't be here, so I don't feel we have to compete with status quo. Sites want something better, we have their attention, so we're at a crossroads of what to offer next. Is it transferable ScriptProcessorNode, AudioWorklet, or something else?

alvestrand commented 3 years ago

1) I checked the Chrome IDL files. There is not one single instance that I could find of an API that is exposed on Worker only except for APIs that specificially deal with managing workers (including AudioWorklet). We will be breaking new ground if we add a new API that can't be prototyped, debugged or used on the main thread.

2) "If people were happy with canvas capture we wouldn't be here". "Here" is four people arguing, two of whom are saying that we should expose the API on the main thread, two of whom are saying that we shouldn't. All four are affiliated with browser vendors.

We have not heard from anyone who is not affiliated with a browser vendor arguing this point one way or the other.

jan-ivar commented 3 years ago

We will be breaking new ground if we add a new API that can't be prototyped, debugged or used on the main thread.

Not really. AudioWorkletProcessor is an interface that cannot be instantiated on main thread, yet has full WPT coverage.

Debugging web workers is trivial these days. People also don't seem to have complaints debugging and testing AudioWorklets, even though I assume that gets a little trickier.

@alvestrand if your argument is we haven't heard from enough people that think the status quo for capturing raw video data is fine, I'm willing to listen.

youennf commented 3 years ago

Some additional API examples:

youennf commented 3 years ago

Additional thoughts on existing API vs. what we are trying to achieve:

jan-ivar commented 2 months ago

https://github.com/w3ctag/design-principles/issues/360 which was opened in response to this was closed by https://github.com/w3ctag/design-principles/pull/404 a year ago.

The TAG's guidance here is now enshrined in ยง 10.2.1. Some APIs should only be exposed to dedicated workers. It seems to support the spec's existing decision to expose in DedicatedWorker only.

Out-of-main-thread processing has been by default since adoption https://github.com/w3c/mediacapture-transform/pull/66.

Can this be closed, or is there more to discuss here?