privacycg / js-membranes

JS Isolation via Origin Labels and Membranes
15 stars 1 forks source link

qq on how this would block untrusted requests to trusted domains #3

Open thezedwards opened 4 years ago

thezedwards commented 4 years ago

howdy - i read through the other issue thread so I'll keep this pretty high-level. I'm largely just looking to better understand this from a deployment perspective and how the benefits would be communicated.

The scenario I'm looking to scope out:

A publisher installs 10 trusted pixels - one of those is for a Tag Manager (Google Tag Manager for instance) - and GTM then ideally fires an additional 3 trusted pixels. One of the other trusted pixels is for ShouldntHaveBeenTrusted.com DSP, who want to fire their own GTM snippet and piggyback data to their own Google Analytics account or other accounts on trusted domains. // How does the membrane stop a trusted script from firing data into a trusted domain, but untrusted account?

This scenario is common with DSP/SSP resellers where suddenly a publisher will have 10 pixels for the same SSP but for 10 different SSP accounts, not all controlled by the same entities or the publisher themselves. This happens due to CSP policies allowing XYZ.com scripts to fire, and then partners taking advantage of that to fire their own scripts on trusted domains.

Can you just sorta explain briefly how a publisher could allow one of these SSPs into their membrane (aka the REGEX allow formulas create a nuance that makes it possible to include account IDs in the membrane allow-list formulas...).. and then how they could combine this with existing CSP protections to only allow certain domains to fire (The deny list protections missing from this membrane proposal.. i think?) and then how NONCES could be used to further ensure integrity of the scripts being served?

From my understanding, a NONCE check basically helps to prevent MiiM and partner shenanigans by the nonce first checking for authenticity and then preventing the script from firing. // But very few orgs have nonce checks on their javascript because it's confusing and takes a lot of custom dev work and audits to ensure it was done properly. // And the difference between a Nonce check and the Membrane proposal is that the nonce check is basically accounting for an attack scenario that rarely happens and is also kinda inflexible in deployment (*no regex allow rules built into it), so this membrane proposal sorta jumps past nonce deployments and towards a similar "not perfect solution" but one that can be combined with a few other protections to make it easier for publishers to control exactly what pixels fire, and write simple reusable recipes to ensure that their allow lists are comprehensive while not creating new vulnerabilities, and maybe in the future these Membrane allow lists could be dynamically synced to ads.txt files or known.privacy files to provide additional optimizations?

It's certainly a technically challenging concept to approach but thanks for ya'll working on this and the dialog on how this gets deployed/communicated to stake holders.

Cheers, Zach

pes10k commented 4 years ago

Thanks for the question

How does the membrane stop a trusted script from firing data into a trusted domain, but untrusted account?

If its a "trusted" script, then the membrane wouldn't do anything at all; the membrane would only intermediate on script / origins that were designated, signifying something less than "trusted" (where trusted here means I trust the script and don't to check / modify / limit its functionality).

But in your example, if you trusted the GTM script, but didn't fully trust the script from "ShouldntHaveBeenTrusted.com", you would label script from "ShouldntHaveBeenTrusted.com" for being wrapped, and then look for it calling global structures you don't think it should (in this case, the JS functions provided by GTM), and preventing those.

So, more concretely, you'd:

  1. Label ShouldntHaveBeenTrusted.com for wrapping
  2. Look for cases where ShouldntHaveBeenTrusted.com is trying to access dataLayer.push (since dataLayer is on the global).
  3. When the above happens, you could do any number of things (throw an exception, just drop the values and do nothing, look at the arguments to dataLayer.push and possibly allow it (indirectly, by having the membrane do the call), etc

re CSP / nonces / etc

CSP and its ways of indicating trusted script (nonces, domain patterns, hashes, etc) is mostly trying to solve a different problem. CSP allows you to describe which code units should and shouldn't be allowed to execute, but doesn't provide anyway of constraining the behavior of a script thats allowed to execute; its an all or nothing choice.

This proposal aims to give browsers, extensions and website the ability to say "I'd like a script to execute, but I'd like to limit or modify (or even just observe) what its able to do.

e.g. I want to include some form validation library, but I don't want to give it network access, or I want to include google analytics, but I don't want it to be able to read the credit card field or access storage, etc

I hope that is helpful. If I didn't understand, or fully answer your question, please let me know and I can give it another go :)

thezedwards commented 4 years ago

Thanks very much, that makes a lot of sense!

I believe I understand how to communicate this -- would another "protection" example be that a publisher could limit certain JS functionality typically used for mouse tracking / heat map software? So you could ensure your partners could make a request but maybe not get access to that additional user data in the page scope?

Another usecase question: would a membrane be able to have a site-wide rule applied to the Referral URL (the user's eTLD+1) so that it would strip any personal data like an email address, similar to how a "#" can do this in a URL? // Basically, would a membrane give publishers the ability to better control 100% of the request headers sent to partners, including pre-built recipes for URL filtering?

In terms of being able to "Observe" what a script is doing; how does that feedback loop work for the developer? Is it a browser dev panel feedback loop or something different?

The other usecase I'm trying to understand if this covers: Imagine a publisher is trying to prevent the exfiltration of their user data, and they have a TOS for their ad tech partners that prevents creating a userID for a user and sharing that userID with any of the ad tech 4th party partners. Would a membrane have the ability to parse URL strings/parameters looking for a uniqueID and then blocking subsequent requests to separate domains using the same apparent uniqueID? And then basically membrane recipes would be written to help publishers block userID mirroring / URL redirects for partner syncs while not blocking safe strings?

Thanks for helping me understand how this differentiates from CSP policies and some of the other protections available -- it will be interesting to see how this develops and what final form it takes.

Cheers, Zach

pes10k commented 4 years ago

would another "protection" example be…

Yep, I think this is accurate. The only relevant limitation I'd highlight is that the proposal only targets global structures. I'm imagining that all the use cases / tracking vectors you mentioned here would in someway touch global structures (the DOM, browser or JS-engine provided prototypes, etc), but if a script was doing something you didn't like, and it only touched data structures it created, this approach would not give you an opportunity to intervene.

(I dont think thats the case in the examples you give, but just want to avoid possible misunderstanding)

would a membrane be able to have a site-wide rule applied to the Referral URL…

If im understanding your question correctly, yes, it would both give the policy script the chance to interpose / prevent / modify any labeled scripts from reading the URL (since document.location.* are all global structures). You could also likely use it to interpose on things like window.fetch or other ways of initiating network requests, and you could modify the headers for APIs that currently allow for header modification (fetch, XMLHttpRequest.setRequestHeader, etc).

However, it wouldn't give you a way of modifying header information in cases where header information isn't currently modifiable (e.g. <img src=X>).

Put differently, you could use this approach to modify the headers of requests initiated using APIs that allow for header modification (AJAX, fetch), and you could use it to arbitrarily prevent requests, but you couldn't use it to modify headers for requests that don't currently have APIs for modifying headers.

In terms of being able to "Observe" what a script is doing; how does that feedback loop work for the developer? Is it a browser dev panel feedback loop or something different?

You could build something like that with this API (again, only for places where the script is touching global structures), but that'd be beyond what this spec would describe. So, you'd have the primitives for something like that, but you'd need to built it on top of this.

Imagine a publisher is trying to prevent the exfiltration of their user data

You could build something like that with these primitives. You could have a policy where anytime a targeted script calls fetch etc, check and possibly modify the URL being called). Though, I dont want to over promise, I expect that a sufficiently determined script could perform some transformations (anything from ROT13 to proper encryption) on the sensitive data (userID) that would make it difficult for the policy script to reason about.

So, you could use this functionality to gain those decision points, even if making the right decision (in this particular case) would be difficult.

thezedwards commented 4 years ago

Thanks again for the dialog and your time on these responses.

So when you say "...only targets global structures..." but then mentioned that the membranes would not be able to block/intervene when a script "only touched data structures it created" ...

// would another way of putting that be something like, "The majority of tracking exploits in browsers use global structures like 'document.location' (is there a list of these?), but a membrane can only stop known-structures, which creates small opportunities for scripts to attempt exploits, but at the same time dramatically limiting these vulnerabilities for a website/browser due to the inclusion of the largest vectors by-default in most membrane-blocking-recipes..."

// I think it would be helpful to have a list of the browser APIs that can be edited and a list of the ones that can't // personally I only know a few off the top of my head and I think that list would help developers think about some of the options. // Would it be correct to assume that a field like "referrer" or "location" or certain cookie response headers can't be touched? -- I definitely think articulating this list, and maybe doing this from the perspective of the COWL proposal (https://www.w3.org/TR/COWL/) for how this could be used to provide new context to data in transit.

That COWL proposal is something I'm scoping to see if it's possible to use that to transmit a new privacy schema (I have an ongoing draft I'm working on of this schema here @ https://docs.google.com/spreadsheets/d/1jrmUpLq88M_lq6iM2-0Tsm1-XSqU-9q-ChcNxSwJ31Y/edit?usp=sharing) // I would really love any thoughts about how the combination of a JS Membrane + COWL + Privacy Schema could help organizations better label data between each other based on dynamic characteristics of users and the consent they provide...

//

That's also SUPER helpful the way you described your final paragraphs about how it's POSSIBLE to do very creative filtering in an attempt to prevent userIDs from being sent to multiple entities BUT you'd also likely be neck-deep in network errors that would be tough to debug. // I think the collaborative nature of recipes will ensure that they evolve so at SOME point in the future there would likely be a semi-stable version of a userID-filtering membrane that could stop certain userID syncing from the largest ad tech companies.

Thanks for the dialog on this -- I'm a huge fan of ya'lls core thesis on JS Membranes that they can't require rewriting code and I think approaching all these filters/additions to current data flow from that perspective, and making that the core deal breaker for new proposals, to be a pretty smart idea.

Cheers, Zach

pes10k commented 4 years ago

Hi @thezedwards, apologies for the delay in getting back to you on this.

would another way of putting that be something like

I think you might be describing the policy, but from a different motivation. The reason to membrane global structures is because global structures are are the only way that a script can cause privacy harm is by modify / accessing / using something it didn't create. If a script can only touch structures it created, then, by definition, the script can't touch private data (since that requires at least some input of data the script authors didn't have access too).

I think it would be helpful to have a list of the browser APIs that can be edited and a

This is an easily list :) The membrane needs to wrap all global structures (this is a slight modification from the existing text, which ill update today). So, thats JavaScript language structures (e.g. Array.prototype.*) and browser provided ones (e.g. location.referer).

That COWL proposal...

I think the COWL proposal is really neat, and Deian, the primary author of it, was on the call the other week and has been very helpful in this membrane work. But I don't think it solves this use case, since it would require labeling and rewriting a large amount of existing code. I also don't think you need both membranes and a formal capability system like COWL; COWL would be more useful for new code, where you wanted to hide new capabilities and values from data that being passed around, but it would be a hard(er) fit for the "mediate all global access" use cases this proposal targets.

Hope that all helps!