w3c / mediacapture-region

This document introduces an API for cropping a video track derived from display-capture of the current tab.
http://w3c.github.io/mediacapture-region/
Other
41 stars 10 forks source link

What makes CropTarget special to require an asynchronous creation? #17

Open youennf opened 2 years ago

youennf commented 2 years ago

We started discussing this topic (whether CropTarget creation should use promise or not) as part of https://github.com/w3c/mediacapture-region/issues/11 and it seems best to use a dedicated thread for this issue.

@eladalon1983 mentioned implementation issues that make using promises desirable, in particular security and potential race conditions if creating synchronously a CropTarget since CropTarget has to keep a link to the process it was created in. The following case was provided:

Existing objects like MessagePort, WHATWG Streams, RTCDataChannel or MediaStreamTrack can be created-then-postMessaged synchronously and UAs are implementing this today, hopefully without the security/race condition issues. AIUI, it seems consistent to use the same approach to CropTarget (synchronous creation), except if we find anything specific to CropTarget that actually prevents this existing pattern.

youennf commented 2 years ago

I don't understand what this means. Each video frame is composed of many pixels from multiple processes in different iframes.

I do not know how Chrome is implemented by I would guess that there is a privileged process that is doing the compositing, might have access to window server and so on, I am talking about this process. Basically this is the process where lives the getDisplayMedia track source. Isn't there such a thing in Chrome implementation? The browser process maybe?

Using this scheme means that some tokens are zombie-tokens, meaningless but undetectable except via cropTo().

There are no zombie tokens. If a CropTarget is kicked out of the CropTarget capturer process cache, it will still be in the renderer process and capturer process will get it through an IPC exchange. The size of the CropTarget cache can be tailored at will since it is just an optimization and has no impact on JS surfacing functionality.

eladalon1983 commented 2 years ago

I do not know how Chrome is implemented by I would guess that there is a privileged process that is doing the compositing, might have access to window server and so on, I am talking about this process. Basically this is the process where lives the getDisplayMedia track source. Isn't there such a thing in Chrome implementation? The browser process maybe?

There's a non-trivial system of interactions between the render processes, browser process and GPU process. It's not clear to me how your proposal maps onto our model; I am very confused by your proposal and its reference to such things as "synchronous IPC".

Equally important, it's not clear to me why we can't go with a general, flexible solution that can be easily accommodated by any reasonable user-agent architecture. I am still waiting for an explanation of how this complexity provides a service to higher constituencies. Barring such compelling motivation, what's the use?

There are no zombie tokens. If a CropTarget is kicked out of the CropTarget capturer process cache, it will still be in the renderer process and capturer process will get it through an IPC exchange. The size of the CropTarget cache can be tailored at will since it is just an optimization and has no impact on JS surfacing functionality.

So now we have a round-robin of cached CropTargets that can be resolved immediately, but if one is ejected we do render-to-render communication to figure out if it's a legitimate CropTarget or not? And we set up this potential channel for communication ahead of time, just in case something gets ejected? And this channel of communication follows the capturer around, and supposedly we only have one, and supposedly it's stationary? Am I really reading you correctly here? And all of this wonderful complexity is in the service of...?

jan-ivar commented 2 years ago

A test in Chrome with https://jsfiddle.net/jib1/yunghz8f/ (on macOS in my case) reveals the following: On a fresh restart:

Time to crop: 83.00
Time to play: 537.20

...and on repeat calls more like:

Time to crop: 3.30
Time to play: 486.60

Numbers fluctuate a bit, but cropping always succeeds with a substantial margin ahead of the video playing.

I then did an extremely rough polyfill of element.cropTarget in https://jsfiddle.net/jib1/wfnv9o3q/ solely for the purpose of timing things, and got numbers like this:

Time to crop: 2.90
Time to play: 509.60

I hope this puts to bed any performance concerns, but if someone wants to do a cross-origin version feel free.

The fiddles also show some bugs in Chrome you may wish to iron out (the blue outline is way off)

eladalon1983 commented 2 years ago

@jan-ivar, I don't understand your methodology for measuring performance.

The fiddles also show some bugs in Chrome you may wish to iron out (the blue outline is way off)

Thanks for pointing that out. We'll definitely look into it.

yoavweiss commented 2 years ago

Asynchronous API's come at a cost to web developers, as they turn JavaScript into a pre-emptive language.

How so?

I recommend Why coroutines won’t work on the web as required reading here.

Reading through this, I'm failing to understand the relevance. Are you suggesting that Promises are the equivalent of co-routines?

beaufortfrancois commented 2 years ago

The fiddles also show some bugs in Chrome you may wish to iron out (the blue outline is way off)

FYI I've filed a Chromium bug at https://bugs.chromium.org/p/chromium/issues/detail?id=1327560 to track this. Thank you @jan-ivar

youennf commented 2 years ago

And all of this wonderful complexity is in the service of...?

@eladalon1983, when looking at this API without any particular implementation in mind, the simplest approach I can think of is this one (with A the getDisplayMedia caller, B the CropTarget creator, C the cropper): B serializes CropTarget information as bytes to A by calling postMessage, which sends them to C by calling cropTo. C uses the provided bytes to do cropping.

You are bringing to the discussion security concerns. It would be good to understand more precisely these concerns, after all A is only removing info by calling cropTo. One approach is for C to validate/enrich the bytes with the help of B through an IPC exchange (some APIs do that A -> C -> B dance). An alternative is to hash/encrypt the bytes so that C can validate them without B's help.

You are then bringing to the discussion performance concerns. The first question is of course understanding what are these concerns. Is it loosing 1 or 2 frames or one second of frames? The typical use case I see is a user clicking on UI to trigger cropping. A latency of a few frames seems fine, 1 second is not. One approach to these concerns is sending the bytes from B to C at CropTarget creation time. And start cropping + store them in a round robin cache to optimise the regular cases. Caching is a well known optimisation tool so I think we can qualify this as a simple approach.

These two mitigations seem honestly simpler than your proposed approach that adds the following wonderful complexity:

To make progress here, I think we should try to focus on https://github.com/w3c/mediacapture-region/issues/48 first.

eladalon1983 commented 2 years ago

The fiddles also show some bugs in Chrome you may wish to iron out (the blue outline is way off)

FYI I've filed a Chromium bug at https://bugs.chromium.org/p/chromium/issues/detail?id=1327560 to track this. Thank you @jan-ivar

Thanks you too, @beaufortfrancois. It was past midnight in my timezone, so I left filing this formally for today. It was a very pleasant surprise to see you've done this for me.

eladalon1983 commented 2 years ago

And all of this wonderful complexity is in the service of...?

@eladalon1983, when looking at this API...

Thank you for this message, @youennf. I know all messages take effort, and I value your time. Alas, having read this message three times, I still cannot find in it an answer to my original question. Namely - what is this incredible complexity serving? I continue to await an answer. I shall now answer your own questions, but I'd appreciate it if you did not forget to answer mine.

Is it loosing 1 or 2 frames or one second of frames?

Being slower to start, stop, or change the crop target, is unfortunate. An application that decide to change the crop-target will often pause the track until it is informed that all subsequent frames are cropped to the new target. This can be noticeable to both the local user as well as remote users. This visible disruption should be minimzed.

And start cropping + store them in a round robin cache to optimise the regular cases. Caching is a well known optimisation tool so I think we can qualify this as a simple approach.

A render-to-render communication channel would have to be produced to support the case of a cahce-miss. A case you expect to happen 0.01% of the time. But this backup would have to be set up every time. That complexity is not warranted. (And please do NOT read this message as an acknowledgement that any of the rest of your proposed solution works.)

eladalon1983 commented 2 years ago

Gentlemen, I have a simple proposal.

  Promise<void> cropTo(Element);
  Promise<void> cropTo(CropTarget);
  1. Same document:

    // Note that *only* cropTo() is async here, which is
    // within consensus.
    await track.cropTo(document.getElementById('id'));
  2. Multiple documents:

    function totallySyncFunction() {
    setTimeout(async () => {
      await getCropTargetAndSendItToCapturer(element);
    }, 0);
    }
    
    async function getCropTargetAndSendItToCapturer(element) {
    const cropTarget = await CropTarget.fromElement(element);
    otherDocument.postMessage(...cropTarget...);
    }

Note that, because receiving the message is asynchronous, the receiver would not observe a difference. This seems completely palatable for a generally synchronous sender.

Wdyt?

jan-ivar commented 2 years ago

... I have a simple proposal.

@eladalon1983 Isn't that a rehash of https://github.com/w3c/mediacapture-region/issues/19, and already answered there (special-casing != KISS)? Let's keep that there.

  setTimeout(async () => {
    await getCropTargetAndSendItToCapturer(element);
  }, 0);

As a side-comment: I don't know how you can credibly argue about performance and add 4 milliseconds of delay for no reason whatsoever. Check out postMessage is ~100 times more frequent than setTimeout(0).

jan-ivar commented 2 years ago

Asynchronous API's come at a cost to web developers, as they turn JavaScript into a pre-emptive language.

How so?

@yoavweiss Everywhere a web developer has written await they're preempted, which means all their state assumptions made right before need to be reevaluated. This is a concrete cost to web developers, and can be a source of hard-to-diagnose races and bugs.

Additionally, once a function or method is made asynchronous, everything that calls it is forced into being asynchronous as well, so the problem spreads like wildfire.

The article I linked to explains this problem in some detail with examples, so I'm not sure what's unclear. Hope this helps. If not, I'm sure there are Google colleagues that can explain this better than I can.

I recommend Why coroutines won’t work on the web as required reading here.

Reading through this, I'm failing to understand the relevance. Are you suggesting that Promises are the equivalent of co-routines?

No, the article was written in 2011 when Promises existed, but async/await didn't. At the time there was pressure from web developers to solve what that ultimately ended up solving a few years later. But how to do it wasn't yet clear.

Ever questioned why the await keyword is needed, or why only async functions can use it?

This is that article. A the time it wasn't clear why those restrictions were needed. But take away either of those requirements, and then you have "co-routines" (the author's term for having to assume every function you call is able to preempt you in a single-threaded environment).

The gem of the article (besides describing the preemption problem and why JS doesn't need locks), was recognizing that generators didn't have this problem, because the yield keyword was explicit. This meant JS developers could tell where preemption would happen and where it couldn't, so they could take appropriate steps. As you may know, async/await in JS ended up being built on generators.

The misguided make-everything-async-there's-no-downside crowd don't seem to understand this or care, that the logical end of what they're advocating (await everything!¹) quickly degrades JS into this preemptive (co-routine) language again. I'm sure they would try to invent locks in JS to solve it. Thankfully, these views are more rare these days than they were in 2011, which is why I had to go back that far for sourcing.


1. I'm contrasting extremes to aid explanation here. Clearly one more async method isn't the problem, nor does the problem only arise at the extreme. But my aim here was to clarify why the design rules we have matter, and mustn't be violated except for reasons we can all agree are sufficient.

youennf commented 2 years ago

Namely - what is this incredible complexity serving?

What I tried to say was that I do not see a big difference of UA implementation complexity/achievable performances between what Chrome implements and the proposed alternatives. The big difference I see is that Chrome current implementation puts some of the complexity on web pages in the name of this optimisation. In general, I think we should favour optimisations that are transparent to web pages.

An application that decide to change the crop-target will often pause the track until it is informed that all subsequent frames are cropped to the new target. This can be noticeable to both the local user as well as remote users. This visible disruption should be minimzed.

I do not really understand in which conditions the application will pause the track, which I interpret as stop displaying/sharing it between the time cropTo is called and the time cropTo promise is resolved.

I can see a few related scenarios:

  1. The web page wants to call getDisplayMedia to capture itself, apply cropping, then start displaying/sending the cropped track. One natural approach here is to make just one getDisplayMedia call (something like getDisplayMedia({ selfBrowserSurface: include, video : { cropTarget : myCropTarget } })). This should simplify the web page workflow and allow some UA optimisations to get the smallest achievable latency. Is it something that we should push to a spec as a way to reach consensus?

  2. The web page is already capturing the tab and is displaying/sharing it. User decides to apply cropping through the web page UI. In that case, calling cropTo after getDisplayMedia is the only way to implement this scenario. This seems similar to when user requests to change camera (going from front to back camera say). I do not think web pages will pause the track in that scenario. There is of course a need for cropTo to be fast enough so that the UI is responsive and that there is no excessive latency between user click and actual cropped frames to appear on screen. In that case, it does not make sense to me for the web page to pause the track.

eladalon1983 commented 2 years ago

As a side-comment: I don't know how you can credibly argue about performance and add 4 milliseconds of delay for no reason whatsoever. Check out postMessage is ~100 times more frequent than setTimeout(0).

The token can be posted while the user is interacting with getDisplayMedia's dialog. Or even before getDisplayMedia is called. Any latency introduced by minting OR posting a token therefore becomes zero. This wonderful fact is courtesy of the separation of produceCropTarget/cropTo into two separate async APIs, and the implementation-specific decision[*] to deposit the token along the codepath which cropTo() has to take. (For Chrome - in the browser process).

The misguided make-everything-async-there's-no-downside crowd don't seem to understand this or care, that the logical end of what they're advocating (await everything!¹) quickly degrades JS into this preemptive (co-routine) language again. I'm sure they would try to invent locks in JS to solve it. Thankfully, these views are more rare these days than they were in 2011, which is why I had to go back that far for sourcing.

Nobody here has ever advocated for making everything async, or re-inventing locks. Let's stick to the arguments presented.

-- [*] You're welcome to adopt or disregard this approach in your own implementation.

eladalon1983 commented 2 years ago

What I tried to say was that I do not see a big difference of UA implementation complexity/achievable performances between what Chrome implements and the proposed alternatives.

I humbly disagree. I believe I have argued the point.

I do not really understand in which conditions the application will pause the track, which I interpret as stop displaying/sharing it between the time cropTo is called and the time cropTo promise is resolved.

You gave two examples. I present you you a third - multiple targets. Assume the application has two sub-sections that the user can switch between sharing by clicking an in-content button. If I were to implement this app, I'd see a click from the user as telling me: (i) stop capturing (and/or transmitting-remotely) the old target's content and (ii) start capturing the new target's content. I'd pause the track until (ii) is completed. This depends on the the app; it would only matter to some.

youennf commented 2 years ago

I humbly disagree. I believe I have argued the point.

It seems we are stuck on this complexity assessment.

I hope we agree though that Chrome's current implementation is putting additional complexity on the web page shoulders for the sake of this optimisation.

You gave two examples.

I gave examples where pause is not required. Your third example is a generalisation of the second example, where pause is not required, at least to the apps I can think of.

If I were to implement this app, I'd see a click from the user as telling me: (i) stop capturing (and/or transmitting-remotely) the old target's content and (ii) start capturing the new target's content. I'd pause the track until (ii) is completed.

I do not see what would drive user to expect pausing here. I would think VC apps tend to not follow that pattern, they prefer a seamless transition. replaceTrack is suited for that seamless transition to go from old track to new track. Also replaceTrack is a chained operation: it was thought fine to delay a bit switching the sources. This probably mean that some limited switch latency is fine for most WebRTC applications.

If the app is so keen to do that kind of optimisations, they can do it on their side in a faster way than any UA-side optimizations:

This depends on the the app; it would only matter to some.

I still do not understand which apps would actually want to do what you described, and for which reasons.

eladalon1983 commented 2 years ago

Holding multiple tracks but only using one at any given time, may incur performance costs, or be subject to implementation-specific practical limitations. However, I don't think this is core to our discussion, so I suggest we drop this particular sub-topic. It should be pretty clear that applications calling cropTo() prefer if the call has an effect sooner rather than later.

youennf commented 2 years ago

It should be pretty clear that applications calling cropTo() prefer if the call has an effect sooner rather than later.

Not at all cost, see replaceTrack precedent for instance. All transparent optimisations are fine. None transparent optimisations like the one we are discussing here should be heavily justified.

jan-ivar commented 2 years ago

As a side-comment: I don't know how you can credibly argue about performance and add 4 milliseconds of delay for no reason whatsoever. Check out postMessage is ~100 times more frequent than setTimeout(0).

The token can be posted while the user is interacting with getDisplayMedia's dialog. Or even before getDisplayMedia is called. Any latency introduced by minting OR posting a token therefore becomes zero.

@eladalon1983 I'm sorry are you still arguing for setTimeout? Do you understand its purpose is specifically to induce delay? If you meant to simply not wait for completion, just remove the await instead (but let's move this idea to #19):

  function totallySyncFunction() {
    getCropTargetAndSendItToCapturer(element).then(() => {});
  }

Back to performance. I've shown a sync minting API in JS is actually faster because otherwise you're essentially serializing the two steps of generating the key and postMessaging it. This seems to undermine your hypothesis that your design is faster. Do you disagree?

@yoavweiss's response suggests your design only eliminates the “token hasn’t yet arrived” case, eliminating the need for timeouts and lengthy failures in case an invalid token is used by VC. So your design sacrifices optimal success path of cropTo in order to optimize for its failure path? Am I missing something?

And in your response here you're finally claiming performance doesn't matter? Color me confused. Or are you criticizing my measurement fiddle for not demonstrating when the performance I measure would matter? If so, it's true I struggled to imagine an example when this would matter, but that would seem to be on you to produce. You've mentioned multiple targets... please show how the shape of produceCropTarget would have any bearing in that case. It would presumably have to start with IPC from a decision point, just like my measurement fiddle (which only starts measuring post-gDM success btw), and presumably open to the same criticisms that decisions could have started even earlier. Let's see it.

The misguided make-everything-async-there's-no-downside crowd don't seem to understand this or care, that the logical end of what they're advocating (await everything!¹) quickly degrades JS into this preemptive (co-routine) language again. I'm sure they would try to invent locks in JS to solve it. Thankfully, these views are more rare these days than they were in 2011, which is why I had to go back that far for sourcing.

Nobody here has ever advocated for making everything async, or re-inventing locks. Let's stick to the arguments presented.

You seem to be in the "no-downside crowd", claiming there's no downsides in the face of me clearly explaining them (having to reevaluate state assumptions, fueling the proliferation of unnecessary async, non-idiomatic JS confusing web devs into thinking the API does more than it does, and finally, performance).

Let me blow up my footnote that spelled out the relevant point it seems you flat-out missed:

  1. I'm contrasting extremes to aid explanation here. Clearly one more async method isn't the problem, nor does the problem only arise at the extreme. But my aim here was to clarify why the design rules we have matter, and mustn't be violated except for reasons we can all agree are sufficient.
eladalon1983 commented 2 years ago

@eladalon1983 I'm sorry are you still arguing for setTimeout? Do you understand its purpose is specifically to induce delay? If you meant to simply not wait for completion, just remove the await instead (but let's move this idea to https://github.com/w3c/mediacapture-region/issues/19):

Messages only make sense in their proper context.

Back to performance. I've shown a sync minting API in JS is actually faster because otherwise you're essentially serializing the two steps of generating the key and postMessaging it. This seems to undermine your hypothesis that your design is faster. Do you disagree?

Did you read my response?

Let me blow up my footnote that spelled out the relevant point it seems you flat-out missed:

I don't think this exaggeration promotes mutual understanding. Let's please keep to the arguments raised and not devolve to misrepresenting each other's arguments even as temporary polemic tools.

eladalon1983 commented 2 years ago

in the face of me clearly explaining them (having to reevaluate state assumptions, fueling the proliferation of unnecessary async, non-idiomatic JS confusing web devs into thinking the API does more than it does, and finally, performance)

I've asked you for the downsides. To your mention of these specific downsides, I answered:

  1. Let's use cropTarget(Element) in the same document - no new asynchronicity.
  2. Let's use the other API when posting to another document - posting a message is asynchronous anyway. And if your sender is synchronous - which is unlikely - let it just painlessly use setTimeout(async...). It's barely noticeable to the sender, and completely invisible to the receiver.
azizj1 commented 2 years ago

Hello there. Aziz Javed from Google Meet here, working with Lindsay from Google Editors. We've already built a very significant feature based on this API's origin trial. I'm happy to inform y'all that we were not at all inconvenienced by the async nature of produceCropId. Exposure on MediaDevices vs. on CropTarget is likewise not at all a concern for us; either approach works. Generally speaking, all alternatives debated here are perfectly reasonable for us, so long as they're performant. We'd favor whichever API shape would allow this feature to be shipped in as many browsers as possible, as early as possible.

jan-ivar commented 2 years ago

We'd favor whichever API shape would allow this feature to be shipped in as many browsers as possible, as early as possible.

@azizj1 glad to hear this! I believe we've already resolved exposure on MediaDevices, so for me that API shape is what @youennf proposed in https://github.com/w3c/mediacapture-region/issues/11#issuecomment-1129883808:

new CropTarget(element)

Sorry this is taking so long to get to the bottom of.

eladalon1983 commented 2 years ago

@jan-ivar, have you read this part of Aziz's message?

I'm happy to inform y'all that we were not at all inconvenienced by the async nature of produceCropId.

youennf commented 2 years ago

Quickly discussed at today's editor's call, hopefully we can discuss and resolve this issue at next WG interim (next Tuesday).

eladalon1983 commented 2 years ago

Repeating from elsewhere for better visibility: I am not sure I'll be able to attend. I regret that the intention to discuss https://github.com/w3c/mediacapture-region/issues/17 was announced only today, or I would have cleared my schedule ahead of time. (As of earlier today, no agenda items were announced.) I'll try to shuffle around my other commitments and make it. But if I cannot, I suggest that it might be more productive to discuss Region Capture during an interim meeting which I can attend. For your consideration.

CC @aboba and @dontcallmedom

eladalon1983 commented 2 years ago

For those interested, the rescheduled meeting will take place 2022-06-23. Seem more details here.

aboba commented 2 years ago

After looking over the slides and listening to the arguments, my take is the following:

  1. The W3C design principles cite interprocess communications (IPC) as a potential reason to choose async over sync, so they don't provide much in the way of definitive guidance here.
  2. There is a good argument that cropTo() needs to be async.
  3. The concerns about resource exhaustion attacks against CropTarget() relate more to a potential implementation issue than to the async vs. sync question. That is, resource exhaustion attacks appear addressable even if CropTarget() remains async. For example, resource allocation could be moved from CropTarget() to cropTo(), since the latter is async.
  4. As Tim Panton pointed out, the only existing implementation uses async and the developers do not believe that a sync approach would be feasible for them. On the other hand, those who favor sync could adopt the async approach and if desired, move resource allocation and other potentially blocking operations to cropTo(), returning much sooner. So it would appear that all implementers should be able to "live with" async, whereas some implementers claim they cannot live with the sync approach.
jan-ivar commented 2 years ago

The W3C design principles cite interprocess communications (IPC) as a potential reason to choose async over sync, so they don't provide much in the way of definitive guidance here.

It says to use sync if the "API ... will not be blocked by ... inter-process communication." (emphasis on blocked)

I hope I showed that any IPC in fromElement is (premature) optimization, and that there's no reason for JS to block on it since it cannot fail, and doing so just slows things down. Needs for failure are discussed and disputed in #48.

whereas some implementers claim they cannot live with the sync approach.

We talked about this on the call, and my recollection was this argument reduced to convenience of implementation, i.e. not a need.

For example, resource allocation could be moved from CropTarget() to cropTo(), since the latter is async.

Exactly. [in Chrome: mint a dud CropTarget that always fails in cropTo instead]

yoavweiss commented 2 years ago

It says to use sync if the "API ... will not be blocked by ... inter-process communication." (emphasis on blocked)

You missed "locks" inside the elipsis

here's no reason for JS to block on it since it cannot fail

It can though

Needs for failure are discussed and disputed in #48.

"disputed" != "settled". There's still an ongoing discussion.

Exactly. [in Chrome: mint a dud CropTarget that always fails in cropTo instead]

That would create a significantly worse developer experience, as failures would be disconnected from the code/origin/party that causes them.

jan-ivar commented 2 years ago

Are there "locks" involved other than IPC? The FPWD says it cannot fail is what I meant. I agree about settling #48 first.

Who "caused" the (global?) resource error seems important only if they can do anything about it. We have no API for relinquishing CropTargets, only minting more, so any recovery would violate § 5.3. Don’t expose garbage collection.

yoavweiss commented 2 years ago

Are there "locks" involved other than IPC?

Locks were suggested as an alternative for renderer->browser IPCs, for off-main-thread renderer<=>renderer communication. I was suggesting that such a solution also requires an async API, to avoid the lock blocking the main thread.

jan-ivar commented 2 years ago

I was suggesting that such a solution also requires an async API, to avoid the lock blocking the main thread.

I don't think it does. Any API can trigger in parallel work. Returning a promise is only valuable if the caller needs the result (success being a result).

If we "avoid blocking" then it says to use sync (if the "API ... will not be blocked by ... a lock.")

aboba commented 2 years ago

If CropTarget() is vulnerable to resource exhaustion attacks wouldn't that imply that`CropTarget()' could fail (e.g. due to lack of resources)?

Moving some or all resource allocation to CropTo() could result in the resource allocation arising later (in CropTo() instead of CropTarget()). So depending on where the resource allocation is done, couldn't either or both of CropTarget() and cropTo() fail, for similar reasons?

Recently I was asked to investigate a bug in a sync API that is supposed to always return immediately. In the most common use cases, it did return immediately (< .1 ms), no matter how much load was placed on it. However, in a particular (reproducible) set of circumstances, it blocks for 30-50 ms. The cause appears to be a mixture of resource (de)-allocation and blocking IPC. It's probably a bug not a feature, but sometimes sync APIs might not be able to return immediately for unanticipated reasons.

I've also seen sync APIs where the "in parallel work" turned out to be more complicated than expected. For example, in ORTC the RTCRtpSender.send() method was originally sync. However, we discovered that the parameters to the method required fairly extensive validation, which could result in issues being discovered after the method had returned, so we had to switch to async. Similarly with RTCRtpSender.getCapabilities(), we thought it would always return immediately because the result would be pre-computed, but that turned out not to be the case for codecs requiring hardware acceleration (e.g. discovering the hardware capabilities could take a while).

jan-ivar commented 2 years ago

If CropTarget() is vulnerable to resource exhaustion attacks wouldn't that imply that`CropTarget()' could fail (e.g. due to lack of resources)?

Causality runs the other way: Letting CropTargets fail allows for implementations vulnerable to exhaustion attacks. Not doing so, doesn't.

A sensible implementation should be invulnerable to resource exhaustion attacks, by simply not tying resources to a token so easily created by anyone.

Moving some or all resource allocation to CropTo() could result in the resource allocation arising later (in CropTo() instead of CropTarget()). So depending on where the resource allocation is done, couldn't either or both of CropTarget() and cropTo() fail, for similar reasons?

What resource allocation is needed? A sensible cropTo implementation can use IPC to find the element it's supposed to crop to without consuming any finite resources. Also, cropTo is behind getDisplayMedia permission.

Chrome has implemented a neat but premature optimization, and refuse to implement the fallback needed to hide the resource exhaustion they’ve exposed themselves to.

I don't find the idea that creating a {} crop target will ever take appreciable time convincing.

eladalon1983 commented 2 years ago

Chrome has implemented a neat but premature optimization, and refuse to implement the fallback needed to hide the resource exhaustion they’ve exposed themselves to.

The spec does not compel anyone to replicate our alleged neat mistakes.

youennf commented 2 years ago

The issue is not really about other UA implementations but about web developer impact. For instance, given Chrome is globally capping the number of CropTarget due to resource exhaustion risks, web developers might want to cautiously create CropTargets, typically only once they know the web page is being captured. This goes against the idea of a web site proactively creating some CropTargets. This might also have an impact on future additions to this API.

It would be good to understand whether Chrome has a plan to fix this global resource exhaustion issue. As I understand it, the main problem seems to be implementation cost.

I already said that but working on https://github.com/w3c/mediacapture-region/issues/48 may allow to make progress on this particular issue.

eladalon1983 commented 2 years ago

The issue is not really about other UA implementations but about web developer impact. For instance, given Chrome is globally capping the number of CropTarget due to resource exhaustion risks, web developers might want to cautiously create CropTargets, typically only once they know the web page is being captured. This goes against the idea of a web site proactively creating some CropTargets. This might also have an impact on future additions to this API.

As previously explained, the global limit is an artifact of the current implementation, and can be changed in a number of ways. The minimal step forward is to make the limitation per-iframe rather than per-tab; that much is likely quite simple, and I doubt more would be needed.

It would be good to understand whether Chrome has a plan to fix this global resource exhaustion issue. As I understand it, the main problem seems to be implementation cost.

I'll be happy to share more about my plans for prioritizing work in Chrome if you would be interested in likewise sharing Apple's timelines for implementing this feature. Given the high engagement, may I read that both Mozilla and Apple are prioritizing this work highly and intend to accomplish it in the near future?