Origin/URL matching - Githubissues

privacycg / js-membranes

JS Isolation via Origin Labels and Membranes

15 stars 1 forks source link

Origin/URL matching #2

Open jackfrankland opened 4 years ago

jackfrankland commented 4 years ago

In the proposal, there is an extra "scriptOrigin" argument passed into the traps. I see in the example there is also what must be an alternative suggestion of "scriptInfo" that contains an instance of URL among other things. I imagine the purpose of either of these is to allow different scripted behaviour based on which untrusted script was being run.

I have a few questions around this please:

Do you expect consistency between the interposedOrigins passed into the proxy membrance, and this extra argument? To be more specific, if it were scriptOrigin, would that be the full origin (https://cdn.example) of the script href, or would it match one of the interposedOrigins (although it may match more than one)?
Would the domain matching pattern allow for negation, i.e. all but first party?
If you would expect a heavy use of scriptOrigin/scriptInfo in the traps, and in the event of there being a decent JS method to match urls to a pattern, do you think the interposedOrigin array would be necessary?

pes10k commented 4 years ago

Yep! As defined right now, those would be equal (that parameter is there to allow for the same handler to be reused across scripts). But if / when engines include script-lineage information, then that would no longer bethecase.
Hmm, that could be useful, and maybe something to look into once / if implementation time starts. Its just defined as is currently to keep the focus on the "novel" part of the proposal, which is the wrapping stuff. But absolutely, you could imagine arbitrary or even reg-ex like syntax there
Yea, again, this is really only there to allow handlers to be reused

Hope that helps! If i didn't cover something, or misunderstood, please let me know :)

jackfrankland commented 4 years ago

Thanks a lot. Just want to clarify something in case I've misunderstood, but appreciate there are more fundamental things to be decided on rather than this.

If the interposedOrigins is ["*"], it sounds like the scriptOrigin in the trap will be "*", is this right? If so, how would you create restrictive logic for a potential origin not yet known to you - without the ability for negation matching? I'm looking at the scenario in the issue https://github.com/privacycg/js-membranes/issues/3, where ShouldntHaveBeenTrusted.com needs to be labelled, but may not be known due to GTM being allowed to embed scripts.

pes10k commented 4 years ago

oh, sorry, i understand your question better now. No, scriptOrigin wouldn't just reflect back the interposedOrigins argument. If interposedOrigins is *, and one of the scripts on your page is www.evil.com or whatever, scriptOrigin in this case would be www.evil.com.

The place where this gets tricky, and where the current proposal is intentionally undefined, is what to do if ShouldntHaveBeenTrusted.com creates a new <script> element (either inline or pointing at a different third party). The probably best case would be to expose the linage of each executing script unit to the membrane, so that the membrane could consider it when making decisions. However, no JS engine currently captures this information in all corner cases (though some do a better job than others).

Another option is to say that if the membrane / policy script is worried about such things, it should at runtime observe when the targeted script is injecting new script units, and prepare accordingly (this seems difficult, though maybe not impossible. Might be best to just have a blanket policy to handle unexpected script units).

jackfrankland commented 4 years ago

Thanks, I did think that would be the case.

Might be best to just have a blanket policy to handle unexpected script units

With the scriptOrigin available, and the need to capture unknown origins, I can see policies quickly devolving to using *, and separating out logic in the handler traps per origin. Although this would be down to the author of the policy, my concerns with this would be:

Performance losses with origin matching via string comparison/regex, for all origins, each time a trap is run
The first party (and other "trusted" scripts) will then be labelled as untrusted, meaning it will share the global with all other scripts.

If/when it comes down to thinking further about the implementation, I think having a comprehensive declarative domain/url matching pattern would be preferable, as it would be easier to define/audit handlers for a particular set of domain rules. It could also be improved upon more easily in the future to cope with script lineage, rather than modifying existing logic in handlers to run through a trace/list. Just as an example, I'm thinking something like ".evil.com > first-party.com".

Other concerns that might be worth mentioning with including scriptOrigin is the potential to break forward-compatibility with generic proxy traps, and that it breaks parity with the corresponding Reflect methods.

pes10k commented 4 years ago

I think having a comprehensive declarative domain/url matching pattern would be preferable

Sure this all seems fine. It would be fine to regex or whatever else in the interposedOrigins position.

The first party (and other "trusted" scripts) will then be labelled as untrusted, meaning it will share the global with all other scripts

Untrusted scripts don't share a global under any proposal, if im understanding you right. Untrusted get membraned, and the policy script can choose to map that back to one global, or a per script global, or arbitrarily anything else.