w3c / mediacapture-handle

https://w3c.github.io/mediacapture-handle/
Other
14 stars 10 forks source link

[Identity] Don't reinvent postMessage #11

Open jan-ivar opened 2 years ago

jan-ivar commented 2 years ago

The way navigator.mediaDevices.setCaptureHandleConfig and track.oncapturehandlechange are defined, they create an unintentional broadcast messaging channel from capturee to all capturers (and all their track clones, wherever they may have been transferred). And due to https://github.com/WICG/capture-handle/issues/9 it works long after capture ends.

Capturee can send data like this:

async send(dataStr) {
  for (const handle of dataStr.split(/(.{1024})/).filter(x => x)) {
    navigator.mediaDevices.setCaptureHandleConfig({handle});
    await new Promise(r => setTimeout(r, 10));
  }
});

Capturers receive data like this:

track.oncapturehandlechange = ({handle}) => div.innerText += handle;

While the handle field is capped at 2k bytes, permittedOrigins is unbounded (with some munging).

This is awkward and there are ways to design APIs that do the same thing without such side effects, so we should do that. Otherwise someone somewhere will start relying on it, and then we have to support another alternative to postMessage forever.

Possible solution

I see no reason to allow a (top-level) document to call navigator.mediaDevices.setCaptureHandleConfig more than once with non-default values. Maybe throw instead?

alvestrand commented 2 years ago

Does this expose any security risk that PostMessage does not? If not, this is irrelevant.

Again, this does not seem to be an adoption blocker.

eladalon1983 commented 2 years ago
  1. +1 Harald's message on both accounts (comparison to postMessage and assertion that this is non-blocking).
  2. The track itself is a message channel through which 3840 x 2160 pixels can be sent 60 times per second.
  3. I do see a reason to allow multiple calls to setCaptureHandleConfig(). Namely, that a captured tab can switch roles without the top-level being navigated. For example, you might change to sharing a completely different slides deck, creating a new "session" from the application's POV, yet not navigate the top-level document.
jan-ivar commented 2 years ago

Does this expose any security risk that PostMessage does not?

I'm not claiming a security risk here. Merely unfortunate redundant design that undermines the desired usage pattern. This is a mechanism for bootstrapping a message channel, which seems undermined if it's easier to use it as a message itself.

I didn't find a general design principle about needlessly reinventing existing concepts, except for event handlers, but this seems like reasonable feedback to me. cc @annevk to see if there is a design principle that applies here.

The track itself is a message channel through which 3840 x 2160 pixels can be sent 60 times per second.

By this rationale, the Capture Handle API itself seems unnecessary. Again, I'm not making a security claim here.

jan-ivar commented 2 years ago

I would also like to understand what the use case is for calling this method more than once. If this serves no purpose then we shouldn't allow it. I did find https://w3ctag.github.io/design-principles/#simplicity relevant here.

eladalon1983 commented 2 years ago

I would also like to understand what the use case is for calling this method more than once.

Please read point number 3 in this message.

jan-ivar commented 2 years ago

switch roles without the top-level being navigated. For example, you might change to sharing a completely different slides deck

This seems out of scope, and not necessary once a relationship has been established.

annevk commented 2 years ago

I looked into this a bit:

  1. It seems this does not observe the scope of either BroadcastChannel or Window's postMessage(). That seems potentially problematic. It introduces a way for origins to reach each other that previously could not reach each other at all.
  2. Introducing a new way of doing messaging is not necessarily something we want. E.g., storage did that and I think everyone agrees that it would be better if we just had done BroadcastChannel from the start.

I'm not sure what the requirements and tradeoffs are here, but it does seem like some more investigation is warranted.

eladalon1983 commented 2 years ago

Before we proceed with the technical discussion of whether setCaptureHandleConfig() should be callable once or multiple times, let's recall that @jan-ivar has claimed that this issue is "adoption-blocking." This technical question can be resolved either way, and 99.9% of the document would remain unchanged. The W3C exists exactly to facilitate discussions of this type. I believe we should first acknowledge that this issue is not adoption-blocking. This would avoid the impression that a WebRTC Working Group chair might be using coercive measures to extract concessions during technical discussions.

Wdys, @jan-ivar?

eladalon1983 commented 2 years ago

@jan-ivar?

eladalon1983 commented 2 years ago

Resurrecting this issue now that W3C-adoption has been unblocked and completed.

It introduces a way for origins to reach each other that previously could not reach each other at all.

The capturer is already receiving every single pixel the capturee is drawing to the screen. These origins are reaching each other already.

Introducing a new way of doing messaging is not necessarily something we want.

I think that if we look closely, we'll see that a lot of APIs can be creatively used for transporting a message. Since in this case we're carrying a message from capturee to capturer, which is a direction in which communication is already possible, I don't see a problem.

alvestrand commented 2 years ago

With capturee -> capturer communication, we have this flood of data running already. If we're worried about opening new channels of communication, "actions" is worrisome because it goes in the other direction.

yoavweiss commented 2 years ago

A drive-by suggestion: Maybe rate-limiting the calls to setCaptureHandleConfig() can reduce the risk of it being used as a bit-by-bit communication channel, while at the same time, continue to enable the use cases of the captured application changing state and wanting to notify the capturer of that.

eladalon1983 commented 2 years ago

A drive-by suggestion: Maybe rate-limiting the calls to setCaptureHandleConfig() can reduce the risk of it being used as a bit-by-bit communication channel, while at the same time, continue to enable the use cases of the captured application changing state and wanting to notify the capturer of that.

If it helps things along (@jan-ivar?), I am OK with adding rate-limiting initially (as a raised exception), but I'd then like to continue the discussion for removing it. Rationale: Consider a legitimate application that calls setCaptureHandleConfig() very rarely. But not necessarily once. It nows needs to worry about those very rare occasions when it makes two calls in overly rapid succession. For example, if a presentation software calls setCaptureHandleConfig() whenever changing to another deck, then redirections could happen too rapidly. Such rare bugs are too likely to be missed by developers and end up as bugs.

alvestrand commented 2 years ago

If we were doing rate-limiting, I'd rather do it on the notification end - for instance, change the description of setCaptureHandleConfig's event firing to something like:

Queue a task to execute the following steps:

  1. At UA's discretion, insert a short delay to avoid firing too many events
  2. set CurrentCaptureHandle to the current value of [[internal slot on capturee indicating last setCaptureHandleConfig]]
  3. if it is different from [[internal slot on capturer indicating what was last fired]], fire event

(idea courtesy Jan-Ivar's discussion of reflecting muting on mediacapture-transform) But again, this is the direction in which we already have a high capacity path (the captured image), so not much of a worry.

eladalon1983 commented 2 years ago

If we were doing rate-limiting, I'd rather do it on the notification end

Agreed.

A potential counter-point would be that this still allows a capturee to bombard the memory of the capturing application, but:

  1. We could allow the UA to put a hard-cap for those truly rare occasions.
  2. This is a very unlikely attack, as the capturee would be expending a lot of its own CPU resources in order to attack an would-be capturer - an entity the capturee does not know exists, let alone its identity.

So I propose:

  1. Adopt your mechanism of allowing the UA to insert a short delay.
  2. Allow UA to raise an exception if setCaptureHandleConfig() is called exceedingly often, according to the UA's own definition. (But ideally something well over twice-per-second. I'm thinking 5 or 10 times per second would be what I'd want to implement in Chrome.)
eladalon1983 commented 2 years ago

It's common for the top-level to be reloaded when users log in/out, but is it necessary? If not, then a user logging in/out/in is one case where calling setCaptureHandleConfig() multiple times would be reasonable.

jan-ivar commented 2 years ago

If we're worried about opening new channels of communication, "actions" is worrisome because it goes in the other direction.

sendCaptureAction is rate-limited because it requires transient activation and consumes user activation.

A drive-by suggestion: Maybe rate-limiting the calls to setCaptureHandleConfig() can reduce the risk

@yoavweiss Thanks, I think that's a good idea worth considering as a minimum.

But I think the fact that (short of screen-scraping) no other cross-storage message channel exists in the platform today, merits concern. It's even superior to the messaging channel it's supposedly bootstrapping, which I guess would be:

  1. the capturer making RESTFUL calls to the capturee's server, and
  2. the capturee making RESTFUL calls to the capturer's server?

I don't see who'd bother setting up the 2nd. if they can send local cross-storage instructions munged into setCaptureHandleConfig().

I'd prefer single-use to start. We can always loosen it later, which is easier on web compat than tightening up mistakes.

It's common for the top-level to be reloaded when users log in/out, but is it necessary?

That's a good question. I've not been impressed by other use cases mentioned offline, which included getting info to the capturer in time for the proposed "conditional focus" API. That's a legitimate problem, but seems deserving of a proper solution, not a hack like this. Maybe a shared controller object like in https://github.com/w3c/mediacapture-handle/issues/12#issuecomment-1065594878 could bring these things together?

yoavweiss commented 2 years ago

Can you clarify what you mean by "cross-storage"? I'm not familiar with that term..

eladalon1983 commented 2 years ago

no other cross-storage message channel exists in the platform today

If I have missed the part of BroadcastChannel.postMessage's specification that forbids cross-tab communication, as of 2022-03-14, please help inform me.

I don't see who'd bother setting up the 2. if they can send local cross-storage instructions munged into setCaptureHandleConfig().

  1. This communication channel would be single-direction, which is why it would not be as useful, which is why applications would generally need something more. (It's also in the already available direction - that in which pixels flow.)
  2. BroadcastChannel and shared workers are still a thing.
  3. Natural users of REST-like approaches would be those who already have a shared login identity for the user and communicate other state, e.g. "user changed preferences" or "user logged out".

but seems deserving of a proper solution, not a hack like this.

I think this is a proper solution, and NOT a "hack".

I've not been impressed by other use cases mentioned offline, which included getting info to the capturer in time for the proposed "conditional focus" API.

I am sorry to hear that you were unimpressed. But I maintain that these are important use-cases, and I know Web-developers who would attest as much.

Maybe a shared controller object like in #12 (comment) could bring these things together?

I don't see how that would work. (Also, controller does not have support in the WG at the moment.)

I'd prefer single-use to start.

What information could theoretically change your mind?

eladalon1983 commented 2 years ago

Issue #35 discusses another use-case for calling setCaptureHandleConfig() multiple times - hinting about encoding. (Note that captured pages can switch between text and video on the fly; for example, if PowerPoint is used and then a fullscreen video is shown, then back to text, then rinse and repeat.)

@aboba, is my Microsoft interested in this use-case?

miketaylr commented 2 years ago
  • BroadcastChannel and shared workers are still a thing.

It's worth noting that these are moving towards communicating across StorageKeys, not origins. See https://github.com/wanderview/quota-storage-partitioning/blob/main/explainer.md#communication-apis, https://github.com/whatwg/html/issues/5803#issuecomment-1040699222, etc.

eladalon1983 commented 2 years ago
  • BroadcastChannel and shared workers are still a thing.

It's worth noting that these are moving towards communicating across StorageKeys, not origins. See https://github.com/wanderview/quota-storage-partitioning/blob/main/explainer.md#communication-apis, whatwg/html#5803 (comment), etc.

I have some concerns about that. I'll reach out internally to discuss.

jan-ivar commented 2 years ago

Can you clarify what you mean by "cross-storage"? I'm not familiar with that term..

@yoavweiss Sorry I meant cross-origin in the brave new world of storage partitioning (chrome, firefox, safari) which @miketaylr later linked to (thanks!) This will break the mailman iframe pattern.

If I have missed the part of BroadcastChannel.postMessage's specification that forbids cross-tab communication, as of 2022-03-14, please help inform me.

Not cross-tab, but cross-origin. The opening paragraph on BroadcastChannel: "Pages on a single origin opened by the same user in the same user agent but in different unrelated browsing contexts sometimes need to send notifications to each other"

eladalon1983 commented 2 years ago

Not cross-tab, but cross-origin. The opening paragraph on BroadcastChannel: "Pages on a single origin opened by the same user in the same user agent but in different unrelated browsing contexts sometimes need to send notifications to each other"

We have discussed before how embedding a cooperating cross-origin "mailman" iframe could bypass the cross-origin restriction. (And in this context Storage Partitioning was brought up.)

jan-ivar commented 2 years ago

... This will break the mailman iframe pattern.