Open jan-ivar opened 2 years ago
Does this expose any security risk that PostMessage does not? If not, this is irrelevant.
Again, this does not seem to be an adoption blocker.
Does this expose any security risk that PostMessage does not?
I'm not claiming a security risk here. Merely unfortunate redundant design that undermines the desired usage pattern. This is a mechanism for bootstrapping a message channel, which seems undermined if it's easier to use it as a message itself.
I didn't find a general design principle about needlessly reinventing existing concepts, except for event handlers, but this seems like reasonable feedback to me. cc @annevk to see if there is a design principle that applies here.
The track itself is a message channel through which 3840 x 2160 pixels can be sent 60 times per second.
By this rationale, the Capture Handle API itself seems unnecessary. Again, I'm not making a security claim here.
I would also like to understand what the use case is for calling this method more than once. If this serves no purpose then we shouldn't allow it. I did find https://w3ctag.github.io/design-principles/#simplicity relevant here.
I would also like to understand what the use case is for calling this method more than once.
Please read point number 3 in this message.
switch roles without the top-level being navigated. For example, you might change to sharing a completely different slides deck
This seems out of scope, and not necessary once a relationship has been established.
I looked into this a bit:
BroadcastChannel
or Window
's postMessage()
. That seems potentially problematic. It introduces a way for origins to reach each other that previously could not reach each other at all.BroadcastChannel
from the start.I'm not sure what the requirements and tradeoffs are here, but it does seem like some more investigation is warranted.
Before we proceed with the technical discussion of whether setCaptureHandleConfig()
should be callable once or multiple times, let's recall that @jan-ivar has claimed that this issue is "adoption-blocking." This technical question can be resolved either way, and 99.9% of the document would remain unchanged. The W3C exists exactly to facilitate discussions of this type. I believe we should first acknowledge that this issue is not adoption-blocking. This would avoid the impression that a WebRTC Working Group chair might be using coercive measures to extract concessions during technical discussions.
Wdys, @jan-ivar?
@jan-ivar?
Resurrecting this issue now that W3C-adoption has been unblocked and completed.
It introduces a way for origins to reach each other that previously could not reach each other at all.
The capturer is already receiving every single pixel the capturee is drawing to the screen. These origins are reaching each other already.
Introducing a new way of doing messaging is not necessarily something we want.
I think that if we look closely, we'll see that a lot of APIs can be creatively used for transporting a message. Since in this case we're carrying a message from capturee to capturer, which is a direction in which communication is already possible, I don't see a problem.
With capturee -> capturer communication, we have this flood of data running already. If we're worried about opening new channels of communication, "actions" is worrisome because it goes in the other direction.
A drive-by suggestion: Maybe rate-limiting the calls to setCaptureHandleConfig()
can reduce the risk of it being used as a bit-by-bit communication channel, while at the same time, continue to enable the use cases of the captured application changing state and wanting to notify the capturer of that.
A drive-by suggestion: Maybe rate-limiting the calls to
setCaptureHandleConfig()
can reduce the risk of it being used as a bit-by-bit communication channel, while at the same time, continue to enable the use cases of the captured application changing state and wanting to notify the capturer of that.
If it helps things along (@jan-ivar?), I am OK with adding rate-limiting initially (as a raised exception), but I'd then like to continue the discussion for removing it. Rationale:
Consider a legitimate application that calls setCaptureHandleConfig()
very rarely. But not necessarily once. It nows needs to worry about those very rare occasions when it makes two calls in overly rapid succession. For example, if a presentation software calls setCaptureHandleConfig()
whenever changing to another deck, then redirections could happen too rapidly. Such rare bugs are too likely to be missed by developers and end up as bugs.
If we were doing rate-limiting, I'd rather do it on the notification end - for instance, change the description of setCaptureHandleConfig's event firing to something like:
Queue a task to execute the following steps:
(idea courtesy Jan-Ivar's discussion of reflecting muting on mediacapture-transform) But again, this is the direction in which we already have a high capacity path (the captured image), so not much of a worry.
If we were doing rate-limiting, I'd rather do it on the notification end
Agreed.
A potential counter-point would be that this still allows a capturee to bombard the memory of the capturing application, but:
So I propose:
setCaptureHandleConfig()
is called exceedingly often, according to the UA's own definition. (But ideally something well over twice-per-second. I'm thinking 5 or 10 times per second would be what I'd want to implement in Chrome.)It's common for the top-level to be reloaded when users log in/out, but is it necessary? If not, then a user logging in/out/in is one case where calling setCaptureHandleConfig()
multiple times would be reasonable.
If we're worried about opening new channels of communication, "actions" is worrisome because it goes in the other direction.
sendCaptureAction is rate-limited because it requires transient activation and consumes user activation.
A drive-by suggestion: Maybe rate-limiting the calls to setCaptureHandleConfig() can reduce the risk
@yoavweiss Thanks, I think that's a good idea worth considering as a minimum.
But I think the fact that (short of screen-scraping) no other cross-storage message channel exists in the platform today, merits concern. It's even superior to the messaging channel it's supposedly bootstrapping, which I guess would be:
I don't see who'd bother setting up the 2nd. if they can send local cross-storage instructions munged into setCaptureHandleConfig()
.
I'd prefer single-use to start. We can always loosen it later, which is easier on web compat than tightening up mistakes.
It's common for the top-level to be reloaded when users log in/out, but is it necessary?
That's a good question. I've not been impressed by other use cases mentioned offline, which included getting info to the capturer in time for the proposed "conditional focus" API. That's a legitimate problem, but seems deserving of a proper solution, not a hack like this. Maybe a shared controller
object like in https://github.com/w3c/mediacapture-handle/issues/12#issuecomment-1065594878 could bring these things together?
Can you clarify what you mean by "cross-storage"? I'm not familiar with that term..
no other cross-storage message channel exists in the platform today
If I have missed the part of BroadcastChannel.postMessage's specification that forbids cross-tab communication, as of 2022-03-14, please help inform me.
I don't see who'd bother setting up the 2. if they can send local cross-storage instructions munged into setCaptureHandleConfig().
but seems deserving of a proper solution, not a hack like this.
I think this is a proper solution, and NOT a "hack".
I've not been impressed by other use cases mentioned offline, which included getting info to the capturer in time for the proposed "conditional focus" API.
I am sorry to hear that you were unimpressed. But I maintain that these are important use-cases, and I know Web-developers who would attest as much.
Maybe a shared controller object like in #12 (comment) could bring these things together?
I don't see how that would work. (Also, controller
does not have support in the WG at the moment.)
I'd prefer single-use to start.
What information could theoretically change your mind?
Issue #35 discusses another use-case for calling setCaptureHandleConfig()
multiple times - hinting about encoding. (Note that captured pages can switch between text and video on the fly; for example, if PowerPoint is used and then a fullscreen video is shown, then back to text, then rinse and repeat.)
@aboba, is my Microsoft interested in this use-case?
- BroadcastChannel and shared workers are still a thing.
It's worth noting that these are moving towards communicating across StorageKeys, not origins. See https://github.com/wanderview/quota-storage-partitioning/blob/main/explainer.md#communication-apis, https://github.com/whatwg/html/issues/5803#issuecomment-1040699222, etc.
- BroadcastChannel and shared workers are still a thing.
It's worth noting that these are moving towards communicating across StorageKeys, not origins. See https://github.com/wanderview/quota-storage-partitioning/blob/main/explainer.md#communication-apis, whatwg/html#5803 (comment), etc.
I have some concerns about that. I'll reach out internally to discuss.
Can you clarify what you mean by "cross-storage"? I'm not familiar with that term..
@yoavweiss Sorry I meant cross-origin in the brave new world of storage partitioning (chrome, firefox, safari) which @miketaylr later linked to (thanks!) This will break the mailman iframe pattern.
If I have missed the part of BroadcastChannel.postMessage's specification that forbids cross-tab communication, as of 2022-03-14, please help inform me.
Not cross-tab, but cross-origin. The opening paragraph on BroadcastChannel: "Pages on a single origin opened by the same user in the same user agent but in different unrelated browsing contexts sometimes need to send notifications to each other"
Not cross-tab, but cross-origin. The opening paragraph on BroadcastChannel: "Pages on a single origin opened by the same user in the same user agent but in different unrelated browsing contexts sometimes need to send notifications to each other"
We have discussed before how embedding a cooperating cross-origin "mailman" iframe could bypass the cross-origin restriction. (And in this context Storage Partitioning was brought up.)
... This will break the mailman iframe pattern.
The way navigator.mediaDevices.setCaptureHandleConfig and track.oncapturehandlechange are defined, they create an unintentional broadcast messaging channel from capturee to all capturers (and all their track clones, wherever they may have been transferred). And due to https://github.com/WICG/capture-handle/issues/9 it works long after capture ends.
Capturee can send data like this:
Capturers receive data like this:
While the handle field is capped at 2k bytes, permittedOrigins is unbounded (with some munging).
This is awkward and there are ways to design APIs that do the same thing without such side effects, so we should do that. Otherwise someone somewhere will start relying on it, and then we have to support another alternative to postMessage forever.
Possible solution
I see no reason to allow a (top-level) document to call
navigator.mediaDevices.setCaptureHandleConfig
more than once with non-default values. Maybe throw instead?