w3c / mediacapture-main

Media Capture and Streams specification (aka getUserMedia)
https://w3c.github.io/mediacapture-main/
Other
121 stars 61 forks source link

risk model of stored permissions and constraint opportunities #991

Open rockinghelvetica opened 7 months ago

rockinghelvetica commented 7 months ago

The Note on Privacy and Security Considerations describes an onus on "developers of sites” that is muddied by the origin-level nature of stored permissions.

Developers of sites with stored permissions should be careful that these permissions not be abused.

Where the risk is that a webcam or microphone is accessed by an unexpected third party, care for this risk is diffused to millions of third parties on many popular web apps.

What I'm seeing in practice

There are many millions of developers working on Replit.com, Huggingface.co, and web apps like Figma (see the Vimeo plugin running under www.figma.com, for example) where a permission for a camera or microphone applies across the entire surface of user-generated content. Figma has a fairly responsible permissions workflow, but even their marketplace security schema doesn't account for where camera & microphone access might be inherited from prior authorization.

In my personal example, I am using Replit.com to develop a mediapipe app, but I am concerned that stored permissions extend to any other repl on the domain. To validate that fear, I navigated to an unfamiliar Huggingface Space and saw my camera turn on, because I had used the camera a month ago when playing with a Gradio demo in an unrelated Space.

How could these permissions be more security and privacy minded?

With the caveat that I’m just a web civilian offering comment & I’m relying on an ballpark sense of the principles being navigated by this group, these were my first thoughts on the problem:

This standard

The constraint mechanism feels like it might be extended to account for the relationship between origin and iframe, a path component, or some other policy/token specified by the origin.

Roughly: when an iframe calls navigator.mediaDevices.getUserMedia as with a Figma plugin or similar…

Questions raised:

Beyond draft changes, developer education about this risk can be improved and further promoted, based on the examples cited (Figma, Replit, et al).

Related standards

The Permissions Policy (https://www.w3.org/TR/permissions-policy-1/#policy-controlled-feature) draft feels relevant, but I need to get more familiar with it.

The ability for an app in an iframe to revoke its own privilege is a helpful mitigation, and it could be more clear on how this works. I would absolutely use it in my own code on Replit, and at the level of e.g. the Gradio framework it would greatly mitigate this risk for many web users. Relatedly: permissions with timed or session durations.

Browsers

Similar to nuances of third-party cookie management, there can be a role for UX like permission audits and better communication about who has access to the camera.

eladalon1983 commented 7 months ago

IIUC, this issue can be summarized as:

[Not a real quote. Paraphrasing.] When the user grants a permission, that permission is keyed on the top-level, and is shared by all embedded documents that the embedder allowlists. That permission may be delegated to a cross-origin iframe without the user's express permission, possibly even without the user's knowledge.

  1. Did I get that right?
  2. Is this special to mic/camera in any way that justifies special-casing by the Media Capture and Streams spec rather than general treatment by the Permissions Policy spec?
rockinghelvetica commented 6 months ago

You did well, but I would be more pointed:

When the user grants a permission, that permission is keyed on the top-level, and is shared by all embedded documents that the embedder allowlists. Where embedders support user-generated code and plugins, the user will not be protected from unexpected usage of the stored permission.

It's technically possible that a platform will read the note in the spec, understand the responsibility, and elect to develop granular media permissions for each embed, but I've yet to encounter a single example.

My instinct for special casing is three part, but subjective:

  1. The mismatch between user expectation ("this app") and implementation (top-level domain) was created by this specification. Not to imply that it was wrong, at all, but the side-effects of hoisting permissions up to the address bar might best be considered as one design.
  2. There are opportunities to improve the communication of the risks in the spec, and maybe provide example mitigations.
  3. There's a significant amount of special-case handling in browser UX for media capture features, which suggests that diffusing the work to Permissions Policy might translate to messy browser differences.
jan-ivar commented 5 months ago

Thanks for your questions. Let me start by answering them, to see if it un-muddies things.

Questions raised:

  • Does permission flow up to the origin (i.e. can Figma.com access my microphone with stored permission because I first granted access to a plugin)?

(I'm unfamiliar with figma parlance, but it sounds like by "plugin" you don't mean a web extension, but instead user-created JS code that figma hosts and runs in an iframe, possibly under a secondary domain, like e.g. https://jsfiddle.net does?)

It sounds like you're asking about permission delegation to iframes, which is mentioned in the permissions spec:

image

I don't know which browser you used, but I'm fairly certain the permission prompt asked you to grant permission to figma.com, not to a specific plugin/iframe. Therefore figma.com has permission, and delegates it as needed.

  • Is this granular permission the default (i.e. does Replit have to update their iframe code to adopt

See § 14. Permissions Policy Integration for how this spec integrates with permission policy. This spec's default allow list is "self", which limits camera and microphone permission to same-origin iframes by default.

I don't know Replit, but https://jsfiddle.net/jib1/r60bzmrs/ runs my JS in a different domain (likely for security reasons), which means it has to explicitly delegate permission using the allow attribute (abbreviated):

<iframe allow="microphone; camera;" src="//fiddle.jshell.net/jib1/r60bzmrs/show/?editor_console=false">

This delegates permission to JS code loaded from fiddle.jshell.net(only) inside that iframe.

  • Implications for browsers to message & manage split permissions for a given domain, allow all, abuse like unique apps asking over & over on a given page, etc.

"Split permissions" is not a thing, but the other items are indeed the job of the User Agent to manage for sure.

How could these permissions be more security and privacy minded?

Permission models is an area of differentiation between browsers. Happy to discuss changes to the spec, but you said "first granted" earlier, are you by chance using a browser that auto-stores permission? This problem seems worse then.

For instance,

We might want to clear up whether your problem is with an implementation before we address the model.

jan-ivar commented 5 months ago
  1. The mismatch between user expectation ("this app") and implementation (top-level domain) was created by this specification. Not to imply that it was wrong, at all, but the side-effects of hoisting permissions up to the address bar might best be considered as one design.

I hope I've shown above that this problem was not created by this specification, and that it faithfully follows the web model when it comes to permission delegation. E.g. this seems to apply equally to geolocation and other permissions.

For that reason it might be appropriate to consider opening an issue on w3c/permissions instead.

We can keep this open to try to add some text to highlight the problem, and suggest solutions for web applications, like using different sub-domains per user. E.g. these have separate permissions (and cookies since github.io is an eTLD):

  • Unlike now, if the origin Allows access the iframe’s UX would be knocked back to Ask. The browser UX and stored permission would proceed in a familiar manner, but specific to the extra constraint.

Note the diversity in browsers I mentioned earlier. Specs generally aren't prescriptive to this level to allow user agents to experiment. User agents are encouraged to solve these situations if they can detect them.

E.g. Firefox will always ask if someone uses the (rather unsafe) allow="camera *; microphone *;" wildcards (click the Navigate to landing button in https://jan-ivar.github.io/dummy/iframe_iframe_gum_starcross.html in Firefox).

rockinghelvetica commented 5 months ago

@jan-ivar I should slow us down: the heading "questions raised" and the matter of "split permissions" pertain to the half-baked direction proposed, not the current standard. It may have been wiser for me to stop at describing the problem, as I think clarity on the risk is more important than any ideas about what to do next.

Yes, a Figma plugin is third-party code, served under a model similar to JS playgrounds

Example of the problem in a Figma context

A simple example in Figma (~3MM montly subscribers) is that I might use a Figma plugin by Vimeo, a well-known third party that I trust and approve to interact with my Figma file in order to record myself discussing a design in context.

  1. I can elect to install and use such a tool from within Figma's webapp, or, I might open a shared design file that has an existing dependency (distribution).
  2. The permission is likely to get stored. That's the default in Chrome, else, one is likely to ☐ Remember this decision in Firefox for a tool used as often. AFAIK Safari can also save permissions starting around iOS 13.
  3. The immediate side-effect is now that other third party code in Figma can launch my webcam. Each plugin within Figma has its own auth flow to interact with my files, but there's no platform review of the relevant utility & an offending plugin could e.g. have useful utility without obvious expectation of camera use.

AFAIK, the most popular browsers in use all set up this risk.

jan-ivar commented 5 months ago

Ah sorry I misunderstood. I agree it's good to narrow down the problem.

I didn't mean to diminish the problem by pointing to it going beyond camera and microphone. On the contrary, this likely needs addressing (call-out or changes) at a higher level, which would be the permissions or permissions-policy spec.

  1. The immediate side-effect is now that other third party code in Figma can launch my webcam. Each plugin within Figma has its own auth flow to interact with my files, but there's no platform review of the relevant utility & an offending plugin could e.g. have useful utility without obvious expectation of camera use.

It is Figma that breaks the trust chain here.

  1. End-users grant OS permissions to web browsers (level 1 prompt)
  2. In the web model, end-users grant permission to the website (level 2 prompt), which in turn is responsible for which third parties it delegates that permission to, and how
  3. It's Figma's job to figure out a model that scales past that to support its complexity (it's unobvious a 3rd level prompt is the answer)

Reviewing the tools available today, websites can manage permission delegation to its iframes by (sub)domains using allowlists:

<iframe allow="camera https://vimeo.figma.net https://sub2.figma.net" src="iframe.html">

Central to such a scheme, "plugins" would run in different sub-domains and not have automatic access to navigator.mediaDevices.getUserMedia by default, which seems to be the problem here. Configuring which plugins get camera access could be part of the "auth flow" when you "install" the plugin (no need for a 3rd level prompt).

rockinghelvetica commented 5 months ago

Thank you for your articulation of the model underlying the spec's design decision.

I think:

  1. leaving the trust model to the domain is undermined by the ambiguity of this responsibility
  2. the high cost/complexity of securing this trust the "right" way suggests it should be browser-side
  3. if the prompt is initiated by third party code, can the host see it?

The responsibility for the trust chain is not clear

If Figma were an outlier, I might direct my feedback there. But this seems to be the common approach — and Figma might be the cleanest case of where the described "web model" works (because the UX is clearly and consistently 1:1 with the domain). They should get this right, and they don't.

The problem is not obvious enough, and at minimum the spec needs to be a lot louder about this.

From a web user perspective, I challenge that a "website" is understood as 1:1 with a domain. MANY very popular things exist today where there are millions of independent web apps running on a single site. I brought up Replit because it was especially interesting: I was developing my own app, which I trust. Granting camera permission in the context of my own project creates a wide-surface risk to any future Replit link.

I tend to think it should be possible for a developer to be MORE conservative than the context they are working in with permissions, i.e. if Replit is dropping this ball, I should be able to request navigator.mediaDevices.getUserMedia with more circumspect scope.

The mental model for trusting a "website" is way more aligned with the UX in front of me, i.e. the game I'm playing, not all the games on a given domain. I personally doubt that most developers invoking this code & messaging users about how the permission will be used understand at all what's going on upstream.

Implementing the granular scheme is complex and expensive

You've pointed out that it's possible for Figma to manage third-party camera use with sub-domains and their own plugin auth flow. They should! But they have incentives to invest in all this feature development, data storage, and testing: Figma document interactions, paid subscriptions, enterprise audits, etc.

For everyone without these incentives, it feels like the browser could best step up. A minimum alternative might be for this group to provide a reference implementation of the scheme and best-practices.

Implementation question

Where a site lets third party code invoke getUserMedia(), is there an appropriate hook/event for the site in that promise workflow, for the purposes of implementing a granular scheme?

Else, if the appropriate design is that the iframe does not have this permission by default, does an error reach the host document so that it can react as it sees fit?

jan-ivar commented 5 months ago

I brought up Replit because it was especially interesting: I was developing my own app, which I trust.

You're also trusting Replit.com. The web model has to consider malicious sites, and if you're instead using EvilReplit.com your trust is misplaced whether the request came from an embedded "app", "game", "plugin", iframe, or not. Your recording might be streamed to a server without your knowledge any number of ways.

Iframes are not an inspectable or secure unit of analysis for the average (non-developer) web user.

Where a site lets third party code invoke getUserMedia(), is there an appropriate hook/event for the site in that promise workflow, for the purposes of implementing a granular scheme?

No, there's nothing specific to getUserMedia() here. What I outlined above doesn't require it.

Else, if the appropriate design is that the iframe does not have this permission by default, does an error reach the host document so that it can react as it sees fit?

No. The iframed document gets a NotAllowedError, but the top-level document doesn't learn about it. Don't figma plugins have some existing way to communicate?

rockinghelvetica commented 5 months ago

You're also trusting Replit.com

In this case, I do trust Replit (they have my credit card), and I also trust myself. In granting this permission, there is nothing but good faith. As a user, the unexpected gap in the trust model is that I have to trust everyone else creating content on Replit (not intuitive nor practical), in the event that Replit isn't taking (demonstrably uncommon) steps to isolate the permission.

I think we're approaching clarity:

  1. We agree there is a risk here? That a user is prompted to allow the use of the camera and/or microphone, and that permission has unexpectedly broad scope?
  2. Sites that allow third-party use of the API do not commonly implement granular controls. I have personally not encountered any examples in the wild.
  3. The technology exists to implement said protections, today, across browsers. It's a little clumsy in that each site host has to provide a bespoke interface for third-party code to request camera access (such as an authenticated flow, UX outside the iframe, or postMessage et al), but it is entirely possible.

If those points are firm, I argue that they feel like a spec-shaped problem. In the sense that something is wrong, and no one is doing a currently-viable thing about it?

edit: An additional reason I think that it's a spec concern is that the opinionated design decision prevents third-party developers (e.g. writing a plugin) from taking steps to protect an end user in the event that the host does not handle this well. It's not possible to be defensive, and there are lots of reasons why people would then make poor decisions to create the risk even in the event that they are aware (distribution, legal partnerships/interoperability, and more).

jan-ivar commented 5 months ago

Thank you for calling attention to this! I agree websites have not responded adequately to this risk. Things I think might help: clearer spec guidance https://github.com/w3c/webappsec-permissions-policy/issues/547; calling them out on it; competition.

Where I disagree:

In this case, I do trust Replit (they have my credit card), ... As a user, the unexpected gap in the trust model is that I have to trust everyone else creating content on Replit (not intuitive nor practical), ...

Maybe don't trust websites that create such gaps, and complain?

the high cost/complexity of securing this trust the "right" way suggests it should be browser-side

That would be a regression. We tried this before https://github.com/w3c/webappsec-permissions-policy/issues/9. The idea of trusting iframes within a page was more confusing to most users, not less.

Think of what the prompt would say. Figma defines what a "figma plugin" is. Steam defines what a "game" is. You don't want browsers defining these things.

rockinghelvetica commented 5 months ago

Maybe don't trust websites that create such gaps, and complain?

I am looking for an example of a site that handles this delegation responsibly. My experience so far is that ALL sites with these kinds of embedded code assets have this carelessness. The market is broken, so to speak. Have you spotted anything?

I will try to reach Replit or Figma again this week.

The idea of trusting iframes within a page was more confusing to most users, not less.

Was this studied with users? I read that issue link, and I was not exactly clear on we tried this before and what broke down.