w3c / mediacapture-region

This document introduces an API for cropping a video track derived from display-capture of the current tab.
http://w3c.github.io/mediacapture-region/
Other
38 stars 9 forks source link

What makes CropTarget special to require an asynchronous creation? #17

Open youennf opened 2 years ago

youennf commented 2 years ago

We started discussing this topic (whether CropTarget creation should use promise or not) as part of https://github.com/w3c/mediacapture-region/issues/11 and it seems best to use a dedicated thread for this issue.

@eladalon1983 mentioned implementation issues that make using promises desirable, in particular security and potential race conditions if creating synchronously a CropTarget since CropTarget has to keep a link to the process it was created in. The following case was provided:

Existing objects like MessagePort, WHATWG Streams, RTCDataChannel or MediaStreamTrack can be created-then-postMessaged synchronously and UAs are implementing this today, hopefully without the security/race condition issues. AIUI, it seems consistent to use the same approach to CropTarget (synchronous creation), except if we find anything specific to CropTarget that actually prevents this existing pattern.

eladalon1983 commented 2 years ago

I think it's easiest to answer with Chrome as the concrete example, thereby keeping the discussion simpler. This is generalizable to other browsers.

Chrome has a central "browser process," and documents are hosted in "render processes." (For simplicity, let's pretend every document has a dedicated render process.) Let's examine multiple documents embedded together in another document, and all living together in the same tab.

Agains for simplicity, we'll call the document where the crop-target lives SLIDE, and the document which holds the track we'll call VC. This I find easier than talking about D1, D2 etc., as we can have a practical example in our mind's eye. If necessary, map (SLIDE, VC) to (D1, D2).

CropTarget is essentially a token. That token is produced in SLIDE and passed elsewhere. It may be passed directly to VC or indirectly. A design that allows for it to be safely be passed through other documents is preferable, as it requires less care of developers. To be safely passed through other documents (and therefore processes), it should encode the minimum amount of information. This is mostly true for JS-exposed information, but non-JS-exposed information that lives in the render process that holds the token, is also theoretically accessible to malicious documents under certain conditions.

So, to keep the minimum amount of information, the token should not actually encode the information that it originates in SLIDE. Instead, this knowledge will be recorded in the trusted browser process as a mapping of T<->SLIDE.

When the token is minted, this mapping has to be recorded in the browser process, which requires IPC, which means that minting the token should be asynchronous. (Minting can fail if the browser process refuses to record more mappings.)

To generalize away from Chrome, other UA-implementers will either run into similar implementation constraints, or else they can just return a pre-resolved Promise and not worry about it.

youennf commented 2 years ago

When transferring a MessagePort or a RTCDataChannel from SLIDE to VC, there is a need to identify that the transferred object is originating from SLIDE, just like CropTarget. How is it possible to be synchronous for those objects but not for CropTarget?

eladalon1983 commented 2 years ago

There could be multiple documents in between SLIDE and VC. Each hop only exposes its own origin. So if the token is sent SLIDE->TOP_LEVEL->VC, then VC would not know the origin of SLIDE, only of TOP_LEVEL.

youennf commented 2 years ago

Again, the same could be said about transferring a RTCDataChannel from SLIDE->TOP_LEVEL->VC. Either this is a new security hardening rule that we cannot enforce it in old APIs but should in new APIs or there is something specific to CropTarget. I am unclear which one it is from your explanations.

eladalon1983 commented 2 years ago

I don't know how common it is to transfer RTCDataChannel objects. (I suspect somewhat uncommon...?) CropTarget is designed for transfer, and would have been wholly unnecessary otherwise. Given the cheap cost of ruggedizing CropTarget against being utilized by an attacker, I see it as desirable. I don't know enough about RTCDataChannel in order to say whether this was necessary there too but prohibitively complicated, or any other reason.

youennf commented 2 years ago

What about MessagePort then? MessagePort are very common and designed specifically for being transferred. From what I can see so far, the problems you are describing have been solved for those objects in a secure and efficient way without using promises.

eladalon1983 commented 2 years ago

Does MessagePort remain bound in any way, or encode to some degree, the document in which it was originally instantiated? If I create a MessagePort in D1, then transfer it to D2, then to D3, will D3 know that this MessagePort is originally from D1?

youennf commented 2 years ago

Does MessagePort remain bound in any way, or encode to some degree, the document in which it was originally instantiated?

It remains bound to the MessagePort it was created jointly through MessageChannel.

If I create a MessagePort in D1, then transfer it to D2, then to D3, will D3 know that this MessagePort is originally from D1?

For RTCDataChannel, WHATWG Streams and MediaStreamTrack yes, always, the 'source' remains in D1.

MessagePort are a bit specific in that they are created as a pair through MessageChannel that can keep communicating with each other. MessageChannel.port1 can be synchronously transferred to another process, as well as MessageChannel.port2. port1 and port2 remain bound together in any case.

youennf commented 2 years ago

Another example that might be closer to CropTarget is OffscreenCanvas. OffscreenCanvas can be created from a canvas element, in which case it remains tied to the canvas element like CropTarget would be to its element. Such OffscreenCanvas can be created-then-transferred synchronously.

jan-ivar commented 2 years ago

@eladalon1983 said in https://github.com/w3c/mediacapture-region/issues/11#issuecomment-1023100726:

  • When D2 calls cropTo(X), the UA has to validate that X is a valid CropTarget.
  • It is undesirable to query all documents and check if any of them have produced X

cropTo already returns a promise, so querying "all" documents in the single viewport captured to identify the crop target, seems reasonable to me.

Cost to implementers is low on the priority of constituencies when it comes to shaping APIs.

eladalon1983 commented 2 years ago

cropTo already returns a promise, so querying "all" documents in the single viewport captured to identify the crop target, seems reasonable to me.

I think you mean "browsing context" where you said "viewport". Do you have a reasonable limit on how many different documents could be embedded in that browsing context?

Cost to implementers is low on the priority of constituencies when it comes to shaping APIs.

Cost to implementers is low priority, but non-zero. It's only a problem if accommodating implementers comes at a non-trivial cost to a higher-priority constituency. Does it?

jan-ivar commented 2 years ago

I think you mean "browsing context" where you said "viewport".

No, each iframe has its own browsing context, which is nested (immediately or multiple levels under) the top-level browsing context.

I loosely mean all documents in this capture, which maybe translates to the top-level browsing context's document and all documents in its nested browsing contexts of iframes that intersect the viewport.

Do you have a reasonable limit on how many different documents could be embedded in that browsing context?

Looking up the CropTarget shouldn't be the bottleneck in extreme cases, so this should scale fine.

eladalon1983 commented 2 years ago

I think you mean "browsing context" where you said "viewport".

No, each iframe has its own browsing context, which is nested (immediately or multiple levels under) the top-level browsing context.

I loosely mean all documents in this capture, which maybe translates to the top-level browsing context's document and all documents in its nested browsing contexts of iframes that intersect the viewport.

I've been carrying that mistake around for a while. Thanks for enlightening me.

Do you have a reasonable limit on how many different documents could be embedded in that browsing context?

Looking up the CropTarget shouldn't be the bottleneck in extreme cases, so this should scale fine.

IPC with multiple processes is neither simple, nor performant, nor robust. The cost to implementers is greatly reduced when avoiding this. What's the downside to any other constituency?

youennf commented 2 years ago

IPC with multiple processes is neither simple, nor performant, nor robust. The cost to implementers is greatly reduced when avoiding this.

This is a known problem that is solved in modern browsers. A transferred WritableStream should not do multiple IPCes to locate the process of its sink when writing new values. As I said before, I'd like to understand why it would be more difficult with CropTarget than with all these other existing APIs.

What's the downside to any other constituency?

It is more costly to both web developers and web engines. It is not consistent with existing Web APIs AFAIK.

youennf commented 2 years ago

Given this is a solved problem for other APIs and given these solutions are applicable to CropTarget as well, can we converge on moving away from Promises?

eladalon1983 commented 2 years ago

IPC with multiple processes is neither simple, nor performant, nor robust. The cost to implementers is greatly reduced when avoiding this.

This is a known problem that is solved in modern browsers. A transferred WritableStream should not do multiple IPCes to locate the process of its sink when writing new values. As I said before, I'd like to understand why it would be more difficult with CropTarget than with all these other existing APIs.

I believe I have explained why we have implemented things this way in Chrome. This is a real issue.

What's the downside to any other constituency?

It is more costly to both web developers and web engines. It is not consistent with existing Web APIs AFAIK.

The cost to Web developers is negligible. Crop-target production is a rare occurrence, it does not matter to the Web developer if it complete asynchronously. I can pull in Web-developers currently using Region Capture (in origin trial) for major products with a high level of polish, and they could comment as much. Would you find that convincing? (If not - please pull in a similarly qualified Web developer who could comment to the contrary.)

Given this is a solved problem for other APIs and given these solutions are applicable to CropTarget as well, can we converge on moving away from Promises?

Let's converge towards Promises, given that it's an important implementation issue for Chrome. (And I believe that when the time comes for Safari and Firefox to implement this, they'll find it equally problematic.)

youennf commented 2 years ago

I believe I have explained why we have implemented things this way in Chrome. This is a real issue.

You explained a real issue, that I would classify as an optimization problem (though at some point you alluded to security concerns as well). Is that correct?

The argument is that the same optimization issue exists for already deployed APIs, and was solved without making use of promises. If we do not want to follow this preexisting pattern, we need a clear justification.

To move forward, let's try with more narrowly focused questions:

As a recap, here are some APIs that I think face the same issue:

I believe that when the time comes for Safari and Firefox to implement this, they'll find it equally problematic.

I implemented in WebKit some of the APIs that I think are facing the issue you are describing. For that purpose, existing techniques were reused so that we do not introducing delay in creation of the objects/delay in transferring the objects/delay in the object algorithms/race conditions.

Answering the above questions might help figuring out which problems I might be overlooking.

alvestrand commented 2 years ago

Note: When we started trying to transfer MediaStreamTracks in Chrome, the synchronous nature of the transfer gave us major problems in implementation. So the idea that synchronous = solved problem is not universally true.

youennf commented 2 years ago

It is great to hear Chrome implementation on transferring tracks is making progress. It is also great to hear Chrome implemented the transfer of synchronously-created MediaStreamTracks in a secure and efficient manner. Can you validate whether the same approach can be used for CropTarget?

The idea that synchronous = solved problem is not universally true.

The point is more about create-then-transfer-synchronously is a solvable problem (do we all agree?), it was solved multiple times already. And that the web platform never cared about this particular implementation difficulty when designing new APIs.

To break this existing pattern, compelling motivations seem necessary.

Another reason not to use promises: what happens in case the element goes transferred to another document, which then gets destroyed (and the element gets reattached to another document), all of this during creation of the CropTarget. Should we reject the promise? A synchronous API is simpler for edge cases as well as for web developers.

eladalon1983 commented 2 years ago

what happens in case the element goes transferred to another document

I had the same concern. I lost that concern when I learned that Elements are not transferable. (But do correct me if I am wrong.)

which then gets destroyed

It is generally possible for a CropTarget to outlive its Element, and that's OK. The document discusses what happens then. The summary is:

youennf commented 2 years ago

I had the same concern. I lost that concern when I learned that Elements are not transferable.

The point I am making is during the creation of the CropTarget, i.e. when the promise is not yet settled.

eladalon1983 commented 2 years ago

I had the same concern. I lost that concern when I learned that Elements are not transferable.

The point I am making is during the creation of the CropTarget, i.e. when the promise is not yet settled.

The thing I am still not getting is - what's going to happen to the Element during that time? The worst that could happen is that it gets garbage collected. I don't think that's a problem. It doesn't seem to matter if the Element is GCed before/after its CropTarget is produced. (And getting GCed after CropTarget-production should be a normal occurrence.)

eladalon1983 commented 2 years ago

I want to settle this discussion about Promises. And I don't want to leave your message unanswered. Let's briefly examine the three other APIs you've brought up:

I hope we can proceed without trifurcating the discussion. I did not want to leave your points unanswered, but deep-diving into these three examples will be unwise. We have an independent engineering question here, and it can be resolved on its own merits. These precedents do not seem applicable. Nor should we assume that mistakes and compromises were not made in the design of these other APIs. Let's discuss our own case on its own merits.

I believe I've made a compelling case for why produceCropTarget() should be asynchronous.

Let's go with an asynchronous produceCropTarget().

jan-ivar commented 2 years ago

I should have posted https://github.com/w3c/mediacapture-region/issues/11#issuecomment-1126528615 here. To summarize it: At least two highly skilled technical people were confused by the current API into thinking it does more than it does.

That's a cost to web developers that we should and do avoid on the regular, as @youennf shows.

jan-ivar commented 2 years ago

(Minting can fail if the browser process refuses to record more mappings.)

This is an incorrect implementation since produceCropTarget is infallible.

eladalon1983 commented 2 years ago

I should have posted #11 (comment) here. To summarize it: At least two highly skilled technical people were confused by the current API into thinking it does more than it does.

It's unclear to me who was confused and in what way, and how making the API synchronous would solve their confusion.

youennf commented 2 years ago

I would first like to get consensus on one API design point. Hopefully, we can all agree on something like:

  1. For any new API, we try to make it synchronous if we can.
  2. If the algorithm requires some hopping to other threads/processes/environments, we switch to async.
  3. If synchronous implementations are overly complex, we switch to async.

AIUI, Chrome current implementation is asynchronous, and @eladalon1983 is stating that doing a synchronous implementation is overly complex. @jan-ivar and @youennf think that such a synchronous implementation is not complex. Some implementation strategies and already implemented APIs that deal with the same issue have been suggested. The rest of the message (sorry for its length) is describing that.

  • MessageChannel:

    • If implementing with direct communication between the processes, the risk involved is a necessary evil. This cannot be said for CropTarget. Discoverability in either direction is not a requirement here, and confers little to no benefit.
    • If implementing with mediation via another process, the story gets more complicated. A valid implementation can hide that it's asynchronically minting identifiers, behind the moment of posting the MessageChannel to the other document. (Some compromises are required, though.) But I don't want to discuss this because it would lose track of the topic - see below.

@eladalon1983, I think we do agree MessagePort can be created and transferred cross process synchronously. Can you validate Chrome implementation of MessagePort creation/transfer, in particular whether Chrome is minting asynchronously identifiers for MessagePorts?

FWIW, given we now have such MessagePort, we can reuse MessagePort to implement CropTarget. Below algorithm is taking the hypothesis that CropTarget is transferable and can be used only once (these limitations can be easily fixed by recreating MessageChannels as required):

  1. At CropTarget creation, create a MessageChannel, set port1 to a slot of the element, and port2 to a CropTarget slot.
  2. When transferring CropTarget to another environment, transfer CropTarget.port2, and recreate a new CropTarget with the transferred port2
  3. When calling cropTo with the new CropTarget, transfer CropTarget.port2 to the process doing the capture.
  4. In the capture process, use port2 to communicate with the element (through port1) to gather the necessary states to start the actual cropping.

Steps 1 and 2 can be done synchronously. Steps 3 and 4 are asynchronous which is fine since they are run as part of cropTo which returns a promise.

  • MediaStreamTrack:

    • These are not currently transferrable in Chrome - not synchronously, not at all. To claim it's possible one needs to present an implementation as proof, not a specification. (Does it work on Safari or Firefox yet? This demo suggests that it's not, as of 2022-03-31.)
    • My colleauges working on mst-transferability tell me that they are running into a lot of issues precisely because of the requirement that they be "synchronously transferable".

It would be nice to hear about the exact MST issues. I would bet this is due to MST complex state handling. CropTarget has no changing state, which makes it a much simpler object to transfer. That said, video MST are probably already transferable synchronously in Chrome by doing the following:

  1. Create a video MediaStreamTrack from canvas
  2. Get the ReadableStream from MediaStreamTrack using MediaStreamTrackProcessor
  3. Transfer the ReadableStream
  4. Recreate the MediaStreamTrack from the transferred ReadableStream using VideoTrackGenerator

Again step 1, 2 and 3 are all synchronous, from creation of a MediaStreamTrack to transferring the MediaStreamTrack via ReadableStream.

It is specified in https://w3c.github.io/webrtc-extensions/#rtcdatachannel-extensions. It is implemented in Safari by reusing pre-existing mechanisms. One reason why it might not be as difficult to implement as MST is that we restrict when RTCDataChannel can be transferred. This simplifies a lot the state handlings. CropTarget is much simpler to transfer than RTCDataChannel since it has no changing state.

The fourth implementation that was brought to the discussion is the following:

  1. CropTarget stores an identifier of the process where lives the element (or the document environment ID which is generated for each document before hand, so can be retrieved synchronously).
  2. CropTarget stores a locally generated identifier of the element.
  3. Element is identified by the pair of these two identifiers
  4. CropTarget is serialised by serializing these two IDs.
  5. When calling cropTo, the capture process identifies the element process through the first ID and gathers element information through the second ID.

Steps 1 to 4 can be done synchronously.

alvestrand commented 2 years ago

I stopped agreeing at 1.

The tendency in WebRTC that I see is to make new APIs asynchronous, because we've had too many instances where we specified a synchronous interface and found that there were cases where they could not be done reliably in a synchronous manner (getCapabilities being the most recent).

I would only design a synchronous interface if it was obvious that any reasonable implementation would never have a need for a step involving interactions with entities outside the renderer process hosting the API.

alvestrand commented 2 years ago

I don't understand the claim that "transferring is synchronous". The "Message port post message steps" (https://html.spec.whatwg.org/multipage/web-messaging.html#message-port-post-message-steps) contains an "Add a task that runs the following steps to the port message queue of targetPort".

How can an algorithm that posts a task to a different task queue be considered synchronous?

youennf commented 2 years ago

I would only design a synchronous interface if it was obvious that any reasonable implementation would never have a need for a step involving interactions with entities outside the renderer process hosting the API.

I do not think creating a cropTarget requires any "interactions with entities outside the renderer process", depending of course of your definition of interactions. Contrary to actually cropping a track for instance. Hence my conclusion that creating a CropTarget does not need to be asynchronous while cropTo should be asynchronous.

How can an algorithm that posts a task to a different task queue be considered synchronous?

Sorry, this was misleading. What I tried to say is that the call to postMessage happens synchronously with the creation of the object. The whole postMessage algorithm is of course asynchronous.

@eladalon1983 mentioned several times that it has to be asynchronous between the time the web app wants to create the object and the time the web app wants to transfer the object. Performances and potential race conditions were mentioned.

The point I am making is that this issue has been solved multiple times and for objects more complex than a CropTarget. This makes me think asynchronicity is not needed by reasonable UA implementations, they will just leverage existing UA patterns.

jan-ivar commented 2 years ago
  1. For any new API, we try to make it synchronous if we can.

I stopped agreeing at 1.

@alvestrand please read § 6.8. Use synchronous when appropriate.

eladalon1983 commented 2 years ago

I think there is a big difference between APIs that are designed for cross-render-process communication, and ones that don't. In this particular message I'd like us to think about performance. Namely, how long after calling cropTo(target2), the cropping actually changes to target2.

youennf commented 2 years ago
  • . That means that cropTo(token) would validate the token through an IPC to that other render process

When we discussed this, your objection was that it would be too costly to send IPC messages to each renderer process. I thought we agreed that a single IPC message was ok.

  • which means the IPC could be delayed by the other render process

This is a solved problem: IPC messages are usually received in a specific run loop, then processed either in a dedicated run loop or the main loop. If main loop blocking is considered a perf issue in this particular context, a dedicated run loop will be used to gather the necessary element information from the provided ID (using a process global map protected by a lock for instance) and send back the information to the capturing process. This can be further optimised in common cases.

eladalon1983 commented 2 years ago

When we discussed this, your objection was that it would be too costly to send IPC messages to each renderer process. I thought we agreed that a single IPC message was ok.

Could you please explain how a single IPC would work?

If main loop blocking is considered a perf issue in this particular context, a dedicated run loop will be used to gather the necessary element information from the provided ID (using a process global map protected by a lock for instance) and send back the information to the capturing process. This can be further optimised in common cases.

That is not a simple solution.

I think performance is yet another argument in favor of solutions that don't involve communication between render processes. Engineering deals with trade-offs. The issue of making produceCropTarget() async, and the small loss in elegance involved therein, does not seem to me to merit the loss of (i) simplicity of implementation, (ii) strong security guarantees and (iii) performance.

youennf commented 2 years ago

Could you please explain how a single IPC would work?

I was meaning a single IPC exchange.

That is not a simple solution.

This is a widely used pattern. How is it not a simple solution? Also, please bear in mind that this is an optional optimization.

I think performance is yet another argument in favor of solutions that don't involve communication between render processes.

That is a UA implementation choice, this is orthogonal to go with sync or async. Lazy and eager initialisations have both tradeoffs and are both doable.

the small loss in elegance involved therein, does not seem to me to merit the loss of (i) simplicity of implementation, (ii) strong security guarantees and (iii) performance.

I disagree with reducing the evaluation to a loss in elegance. About simplicity of implementation, this is of course debatable, I think we should assess it according existing UA patterns. About security, there are AFAIUI no differences. About performances, this seems orthogonal. UA will have to decide between a few additional milliseconds vs. the issues related to a central memory storage, or use hybrid solutions to get the best of both worlds.

yoavweiss commented 2 years ago

Apologies for the delay in responding here.

To reduce confusion (mine or otherwise), let me try to summarize my current understanding.

Based on @eladalon1983’s previous comment here, minting can fail.

Such failures can happen if minting is done in the browser process or in the renderer process itself, e.g. if the developer is minting too many such tokens, and the implementation wants to impose limits on that. It seems reasonable that the API shape would allow for such failures to happen at minting time, and not just at crop time (where they’d be caught by a separate document, on a potentially separate origin).

Given the above, it seems to me that choosing a sync API shape for minting would significantly restrict implementations in their flexibility and force them to implement renderer-to-renderer communication channels which would require extra care for security purposes, and may suffer from performance issues, as pointed out up-thread. Those potential performance issues would then cause user friction and frustration: We either introduce a delay when minting the tokens synchronously (as there’s a lock on the minting container, for off-thread IPC handling), or we introduce a delay when cropping (in case we need to establish communications with the other renderer’s busy main thread).

Reading through the use sync when appropriate Web Platform Design Principle, it calls out the use of locks and IPCs as ones that require async designs. I think we all agree that a performant implementation of renderer-to-renderer communications would require a lock, at the very least.

Compounding the above with the TAG’s opinion that an async design here doesn’t significantly decrease developer ergonomics, and the fact that this issue hasn’t come up as a hurdle to adoption as part of the Origin Trial in Chrome, I’m unconvinced that moving the API to be sync is warranted, as far as the priority of constituencies is concerned.

Starting out with an async API also seems more conservative, and a decision we could revert at a later point, if we deem it necessary. Implementations that don’t believe the above issues warrant asynchronicity could always return an immediately resolved Promise.

youennf commented 2 years ago

Based on @eladalon1983’s previous comment here, minting _can fail_.

Thanks for bringing this, I missed that bit of information. A few related thoughts:

  1. The produceCropTarget algorithm is currently not describing that it can fail. As it is, it cannot fail. If we think this is important, we should change the spec to describe when it fails so that implementations can have interop.
  2. It is not clear how this 100 restriction kicks in. If it is per renderer process, it can be done inside the renderer process. If it is global to all renderer processes, I agree that some async IPC is the usual way to deal with this.
  3. Can this 100 restriction be used as a cross-origin/cross-storage-partitioning messaging channel, say between two iframes living in the same process (out-of-process iframes is not a thing in all platforms) or between two iframes with the same origin but different top origins, or in case of same-process-but-cross-origin navigation? If this threat is real, we should probably change the design/implementation of such APIs. Having the failure at cropTo time is of course safe given capturer has access to capturee content.
  4. Elad mentioned in the past that Chrome current implementation is doing some surface labelling processing at CropTarget creation time and that it might be better to move this to cropTo time. Would that alleviate the need for this restriction? Is Chrome's plan to keep that 100 restriction or to remove it at some point?
  5. In general, APIs that can fail like this in odd cases is not really appealing, I doubt web developers will actually think of handling this error case and I am not sure what they can do to recover from this. Web applications will be better served with UA implementations that cannot fail. Web developers will have to deal with errors at cropTo time so it seems best to keep it the single error handling point.
  6. Is Chrome having a similar restriction for MessagePort creation? Why is Chrome fine with synchronous MessagePort creation?
eladalon1983 commented 2 years ago
  1. The produceCropTarget algorithm is currently not describing that it can fail. As it is, it cannot fail. If we think this is important, we should change the spec to describe when it fails so that implementations can have interop.

Let's add it. We could specify that we allow the UA to fail for an implementation-specific reason. Or if we can't agree on that, we can document that some implementations do that.

  1. It is not clear how this 100 restriction kicks in. If it is per renderer process, it can be done inside the renderer process. If it is global to all renderer processes, I agree that some async IPC is the usual way to deal with this.

Presently, in Chrome's implementation, the limitation is global to all documents embedded within a given tab. Other browsers are free to implement with no limitation at all. Note that it's hard to come up with an app that would legitimately need more than 1-5 such tokens, so I don't think we'll have interop issues here if Chrome limits to 100 and Safari to 200. Should this assertion be proven false in the future, we will have the flexibility to fix those interop issues, either by changing the limit or by finding ways to remove it.

  1. Can this 100 restriction be used as a cross-origin/cross-storage-partitioning messaging channel, say between two iframes living in the same process (out-of-process iframes is not a thing in all platforms) or between two iframes with the same origin but different top origins, or in case of same-process-but-cross-origin navigation? If this threat is real, we should probably change the design/implementation of such APIs. Having the failure at cropTo time is of course safe given capturer has access to capturee content.

That's an interesting consideration. I can think of some mitigations, but when the time comes, I might need to tweak the Chrome implementation to address this. Having an async design increases everyone's flexibility in handling newly surfaced concerns such as this.

  1. Elad mentioned in the past that Chrome current implementation is doing some surface labelling processing at CropTarget creation time and that it might be better to move this to cropTo time. Would that alleviate the need for this restriction? Is Chrome's plan to keep that 100 restriction or to remove it at some point?

Tagging at cropTo() time increases the time until the first frames are cropped and therefore reduces app performance - an important consideration for a highly ranked constituency. (IPC, start tagging in the render pipeline, etc.)

Tagging as soon as we call produceCropTarget() - our current implementation - means tagging elements incurs non-zero cost. It is good that we dissuade the application from making excessive calls without good reason.

  1. In general, APIs that can fail like this in odd cases is not really appealing, I doubt web developers will actually think of handling this error case and I am not sure what they can do to recover from this.

They can modify their application to not tag more elements than strictly necessary.

cropTo time so it seems best to keep it the single error handling point.

That error is in a different document.

  1. Is Chrome having a similar restriction for MessagePort creation? Why is Chrome fine with synchronous MessagePort creation?

I was not around when MessagePort was designed and implemented. I cannot tell you whether compromises were made, or for what reason. I'd guess that MessagePort likely involved direct communication between render processes as a design choice with obvious benefits, and that this allowed an implementation without resources consumed by the browser process. But I am only guessing here; I'd have to dig through the code and discussions to know more. I don't think this would be productive, since we have good reasons to avoid this model for the present API.

--

Consider, btw, that we're likely going to want to expand the API in the future, so that it would work with cropping a track derived of capturing ANOTHER tab. We don't need to get consensus on that just yet, but it's good to keep doors open. So what will we want then? Would it not be nicer if technical limitations are surfaced as errors on the tab that produced the error? Would it not be better that produceCropTarget() fail in the captured tab, rather than cropTo() in the capturing tab?

youennf commented 2 years ago

They can modify their application to not tag more elements than strictly necessary.

That is not true in general, given how the web is authored today. Web pages embed adds and do not control what adds will do (they might well do fingerprinting so will generate CropTargets) Web pages are running in processes that run unrelated other pages, especially on small devices.

That error is in a different document.

There is a communication channel between the two, otherwise there would be no CropTarget.

Having an async design increases everyone's flexibility in handling newly surfaced concerns such as this.

The requirement for an async mechanism is based on an implementation that is showing big limitations. It seems best to spend some time on improving the implementation as much as possible, then to look at whether async is actually required. I'll be happy to provide feedback to get great performances while keeping API sync. Asking for async just in case it might be useful in the future does not seem like a good way to proceed.

Tagging at cropTo() time increases the time until the first frames are cropped and therefore reduces app performance - an important consideration for a highly ranked constituency. (IPC, start tagging in the render pipeline, etc.)

Tagging as soon as we call produceCropTarget() - our current implementation - means tagging elements incurs non-zero cost. It is good that we dissuade the application from making excessive calls without good reason.

As said before, you can get the best of both worlds (no big repository/privacy issue, no latency in 99.99% cases) without too much issue. There is no need for an async API to achieve this.

Would it not be nicer if technical limitations are surfaced as errors on the tab that produced the error? Would it not be better that produceCropTarget() fail in the captured tab, rather than cropTo() in the capturing tab?

I don't think so. First, this creates potential privacy issues that we should avoid since produceCropTarget can be called without any permission. cropTo requires some permissions so no big deal exposing it at this point. We usually select designs that avoid those kind of issues hence the cropTo appeal. Second, produceCropTarget will rarely fail. When it will fail, web applications will get broken. Typically a message will not be sent and the application will mysteriously hang. cropTo will sometimes fail and web applications will probably have an error recovery mechanism that will be able to cover both cases. If we want to allow the web application to understand the issue, all we might need is to reject with an error indicating that the CropTarget is invalid/a new one is needed.

11. Is Chrome having a similar restriction for MessagePort creation? Why is Chrome fine with synchronous MessagePort creation?

I was not around when MessagePort was designed and implemented.

It is always good to look at existing APIs and existing UA patterns. Deviating from these patterns is ok when we see clear shortcomings but we need to try applying how existing APIs/patterns deal with those same issues.

jan-ivar commented 2 years ago

@yoavweiss I'm hearing vastly differing claims about what produceCropTarget needs to do that aren't supported by the spec: The spec says: "Calling produceCropTarget on an Element of a supported type associates that Element with a CropTarget. This CropTarget may be used as input to cropTo."

That's literally all it says: create an association between an interface that cannot be serialized with one that can.

Clicking on CropTarget confirms this: "CropTarget is an intentionally empty, opaque identifier that exposes nothing. Its sole purpose is to be handed to cropTo as input." — Nothing about cropping, preparing for cropping, IPC, render processes, or any failures. It's infallible and serializable, that's all.

There's no reason this needs to be asynchronous. We've had experienced Google folks suggest element.id be used instead — a proposal that fell for other reasons — but being synchronous never came up. I agree with @youennf we need to go back to the drawing board if Chrome has vastly different ideas for this than what they've proposed for standardization.

I can think of no reason why this needs to be async, I could literally polyfill CropTarget by minting a MessagePort, and it would work in 90% of cases (same doc, same origin, same site, which has similar performance characteristics AFAIK, just not true cross-doc).

Asynchronous API's come at a cost to web developers, as they turn JavaScript into a pre-emptive language. I recommend Why coroutines won’t work on the web as required reading here. I can also talk at length about async functions not having been great for WebRTC.

jan-ivar commented 2 years ago

I should add (or I'll be called out on it) it goes on to say: "The user agent MUST resolve p only after it has finished all the necessary internal propagation of state associated with the new CropTarget, at which point the user agent MUST be ready to receive the new CropTarget as a valid parameter to cropTo." — but the sole requirement here seems to be that a newly created target MUST be accepted by cropTo, which is easily accomplished a number of ways given that any target must be postMessaged and cropTo is already async. The need for returning a promise for this is what has no consensus and is being challenged.

Also this second part is about of how it does what it does, whereas my initial quote is more importantly about what it does, which doesn't match claims made here.

jan-ivar commented 2 years ago

Regarding performance claims, it would seem faster to not block postMessage, and let it happen in parallel with Chrome's implementation of produceCropTarget, and have cropTo deal with it not arriving in time with some backup strategy (try again a few times, or worst case do some IPC) before giving up entirely. This would seem the absolute fastest success path if it arrives in time, and worst case no slower than today where you're essentially serializing the two steps of generating the key and postMessaging it.

yoavweiss commented 2 years ago

@yoavweiss I'm hearing vastly differing claims about what produceCropTarget needs to do that aren't supported by the spec: The spec says: "Calling produceCropTarget on an Element of a supported type associates that Element with a CropTarget. This CropTarget may be used as input to cropTo."

That's literally all it says: create an association between an interface that cannot be serialized with one that can.

Clicking on CropTarget confirms this: "CropTarget is an intentionally empty, opaque identifier that exposes nothing. Its sole purpose is to be handed to cropTo as input." — Nothing about cropping, preparing for cropping, IPC, render processes, or any failures. It's infallible and serializable, that's all.

The current processing model seems indeed insufficient in describing what current implementations are doing (or planning to do). I sent https://github.com/w3c/mediacapture-region/pull/47 to clarify that part and make it (hopefully) more rigorous.

There's no reason this needs to be asynchronous. We've had experienced Google folks suggest element.id be used instead — a proposal that fell for other reasons — but being synchronous never came up.

I beg to differ on that last part.

I agree with @youennf we need to go back to the drawing board if Chrome has vastly different ideas for this than what they've proposed for standardization

I don't believe that's the case. I provided spec clarifications at https://github.com/w3c/mediacapture-region/pull/47 in the hope that they'd help bridge our common understanding.

I can think of no reason why this needs to be async

See my comment above

Asynchronous API's come at a cost to web developers, as they turn JavaScript into a pre-emptive language.

Understood. At the same time, there's general agreement that when locks or IPC calls are involved, async APIs are called for. I'd also like to stress out (again) that the TAG did not find the burden significant in this particular case.

youennf commented 2 years ago

I sent #47 to clarify that part and make it (hopefully) more rigorous.

Before doing this, I think we should first decide whether failing generation of CropTargets is something we want to expose to the Web and how. I filed https://github.com/w3c/mediacapture-region/issues/48 to dig into that. Can we continue this particular discussion there?

eladalon1983 commented 2 years ago

That is not true in general, given how the web is authored today. Web pages embed adds and do not control what adds will do (they might well do fingerprinting so will generate CropTargets)

They should not embed abusive iframes. IIRC, you have yourself, @youennf, brought up a few objects of which browsers can only instantiate a limited number.

There is a communication channel between the two, otherwise there would be no CropTarget.

no latency in 99.99% cases

I think the design in Chrome reduces latency 100% of the time. I think you claim "no significant latency in 99.99% cases." It's unclear to me if you've measured this.

Second, produceCropTarget will rarely fail. When it will fail, web applications will get broken.

  1. As I understand it, Safari and Firefox plan an implementation that imposes no limit on the number of tokens that can be minted. It is not mandated that produceCropTarget must fail. It would be an issue with Chrome's implementation.
  2. I think the exact value of "rarely" is important here. If it's not too rare, it will be handled by applications. If it's exceedingly rare, like SHA-1 collisions, it's not a problem. Which specific value are you aiming at? Why are we worried?

I filed #48 to dig into that. Can we continue this particular discussion there?

Gladly.

yoavweiss commented 2 years ago

Regarding performance claims, it would seem faster to not block postMessage, and let it happen in parallel with Chrome's implementation of produceCropTarget, and have cropTo deal with it not arriving in time with some backup strategy (try again a few times, or worst case do some IPC) before giving up entirely. This would seem the absolute fastest success path if it arrives in time, and worst case no slower than today where you're essentially serializing the two steps of generating the key and postMessaging it.

We have four options: browser-side sync, browser-side async, renderer-side sync and renderer-side async. Let's examine the performance characteristics of each, shall we?

renderer-side token production

Renderer side token production would require to establish renderer-to-renderer communications. If these communication channels are established on the main thread, sync failures are possible, but the downside is that a busy main-thread on SLIDE would mean that cropTo takes a long time to return.

If the communication channels are established in an auxiliary thread, that requires a lock when minting, which means we should go with an async interface, to avoid locking the main thread on this lock.

Browser-side sync token production

A sync API and browser-side minting would not enable developers of SLIDE to know that the minting has failed (e.g. due to excessive minting of tokens on their behalf). The reason for that is that the communication to the browser process is async by nature, so the immediately returned token may or may not be a valid one.

In this case, we’d be relying on the VC side to notice failures when using cropTo and notify SLIDE’s developers in case it’s their fault.

Another issue is that with a sync API, we can have a race condition where cropTo is called by VC before the IPC from SLIDE has successfully completed.

Yet another case is VC calling cropTo with an expired token (e.g. the SLIDE document was detached).

How would an async cropTo with sync token minting respond to each one of those failure cases?

The async cropTo would need in this case to send an IPC to the browser process. If the token is in the token container, then all is well and cropTo is resolved. If it’s not, it could be any one of the above cases:

In order to properly handle the first case, the browser process would have to e.g. add an observer to token minting that would resolve the cropTo promise once the token has arrived. But it would also need to have some sort of timeout that rejects cropTo after X seconds when the token doesn’t arrive.

That means that errors of using a removed or invalid token will take a long time to resolve. While one can argue that we could keep past tokens in memory, it’s unclear for how long that would be required. This would also have the undesired side-effect of increasing the long-lived browser process’ memory footprint (hurting users).

Browser-side async token production

Finally, an async produceCropTarget would tell SLIDE when minting failed, and enable for shorter debug cycles.

It would also make sure that SLIDE doesn’t send the token to VC before the browser side is ready for it. That eliminates the “token hasn’t yet arrived” case, eliminating the need for timeouts and lengthy failures in case an invalid token is used by VC.

youennf commented 2 years ago

@yoavweiss, I think there are other approaches that sounds better, for instance:

This approach removes the capturer process memory capping requirements/privacy issues, removes the need for web pages to have to deal with CropTarget failures, leads to maximum performances in almost all reasonable cases and is reasonably simple.

eladalon1983 commented 2 years ago

Send an IPC to the capturer process doing the capture.

You can mint CropTargets before there is a capturer process. By design. And there might be multiple capturers. Either through multiple calls to getDisplayMedia() or through cloning. And MediaStreamTracks are transferable.

youennf commented 2 years ago

I should not have used capturer process, it is an ambiguous term. By capturer process, I was meaning the process that does the actual generation of the track source video frames, not the renderer process calling getDIsplayMedia.

eladalon1983 commented 2 years ago

By capturer process, I was meaning the process that does the actual generation of the track source video frames

I don't understand what this means. Each video frame is composed of many pixels from multiple processes in different iframes.

handled using round robin say

I am firmly opposed to this too. This has been discussed multiple times. Using this scheme means that some tokens are zombie-tokens, meaningless but undetectable except via cropTo().

I think we're rehashing the same discussions multiple times.