tc39 / proposal-shadowrealm

ECMAScript Proposal, specs, and reference implementation for Realms
https://tc39.es/proposal-shadowrealm/
1.43k stars 67 forks source link

Considering the impact of ShadowRealm on ad blocking, privacy and similar browser extensions #406

Open kzar opened 4 months ago

kzar commented 4 months ago

The ShadowRealm API is new to me, sorry if any of these points are obvious or wrong. But I thought I'd write down my thoughts from the perspective of an extension developer in case it helps. I've been thinking about both how I might use the API from an ad blocker or privacy protecting extension, but also how websites might use the ShadowRealm API to circumvent such extensions.

Using the API

Sometimes these extensions need to run a content script in the page, before the page scripts run, in order to wrap troublesome APIs in an attempt to stop the website doing something. As an example, Adblock Plus needed to wrap the WebSocket API like this when it was new, before the chrome.webRequest API supported blocking such requests directly. This is tricky to get right, since as soon as the page's scripts have run, you can no longer trust much. You end up having to keep references to any API you might use in the future, in case they are messed with later. You even have to consider methods that might be implicitly called by your code. As an example, check out the old WebRTC wrapping code we wrote for Adblock Plus, to prevent websites using the API to load ads.

Perhaps ShadowRealm could help with this kind of situation? If we could create a ShadowRealm at the start of our content script, perhaps most of the logic could go in there and only the messaging and code exposed to the page would need to be hardened? For this to be much use I think we'd often need a way to synchronously communicate between the ShadowRealm we created and the page.

Websites abusing the API

For privacy protecting and ad blocking extensions to be effective, we need our content script to run for all frames. Otherwise the page can make use of unwrapped APIs (e.g. for fingerprinting the user) by creating an iframe and then using the API from there. Sometimes websites will also pass a prototype's method back out from an iframe if they suspect we wrapped it, so that they can use it from the parent to try and get access to something in the parent. This is an ongoing issue, especially since websites can sometimes use tricks to create an iframe that some browsers won't run the content script for.

To prevent websites using these tricks from ShadowRealms I would hope that:

  1. Methods can't be passed out of the ShadowRealm to the parent for use by the parent.
  2. ShadowRealms can't directly do things like open WebSocket connections or read/write cookies.

OR

  1. Browser extension content scripts can be run for ShadowRealms, like they are run for pages and iframes.

Otherwise, such extensions will be stuck trying to wrap the ShadowRealm API as well, which is probably bad news for everyone involved 😅.

Hope that helps and shout if I can clarify anything! Dave

mhofman commented 4 months ago

This is tricky to get right, since as soon as the page's scripts have run, you can no longer trust much. You end up having to keep references to any API you might use in the future, in case they are messed with later

This is most likely a problem better addressed by https://github.com/tc39/proposal-get-intrinsic. With ShadowRealm you might be able to capture the original ShadowRealm constructor if your script runs first, but any interaction across realm still requires code executing on each side, and the ShadowRealm's global only has very limited support for most Web APIs

This is an ongoing issue, especially since websites can sometimes use tricks to create an iframe that some browsers won't run the content script for.

You might be interested in https://github.com/WICG/proposals/issues/144

I am also interested in similar "init scripts" mechanisms but at the TC39 level.

  1. Browser extension content scripts can be run for ShadowRealms

I would very much hope that extensions do not gain that capability (the same they don't for Workers afaik)

  1. Methods can't be passed out of the ShadowRealm to the parent for use by the parent.

You can pass function, which get wrapped through the callable boundary, only allowing primitives and other functions as arguments / return values. There is no way to use these function to "apply" them to the current realm, but they can be called as normal, effectively allowing the use of any capability exposed by the ShadowRealm.

2. ShadowRealms can't directly do things like open WebSocket connections or read/write cookies.

These Web APIs are currently excluded from being exposed in ShadowRealms

such extensions will be stuck trying to wrap the ShadowRealm API as well,

That honestly may be your safest bet here. The biggest problem you'll encounter is how to synchronously evaluate some code inside the realm to "repair" the ShadowRealm global. Unfortunately the current evaluate too easily trips up CSP rules.

kzar commented 3 months ago

Thanks that's very helpful, it sounds like there's not anything that needs to be adjusted in ShadowRealm for the use cases I was thinking about here.

Thanks for the links as well, Reflect.getIntrinsic in particular sounds interesting. Getting off topic here slightly, but do you know if the Reflect.getIntrinsic API itself could be wrapped by a page/extension script? FWIW I'd prefer that it could be, so that the extension can stash a reference but then wrap it as necessary to avoid circumvention.

mhofman commented 3 months ago

Any first run script that wants to virtualize / repair an environment would need to wrap getIntrinsic. One thing we'll need to look into is the difficulty that ShadowRealm creates in applying repairs like that in any new realm. Transitively wrapping the ShadowRealm constructor is tedious and as I mentioned raises problems with CSP

tophf commented 3 months ago

@kzar, note that running a content script inside a same-origin iframe won't help you because the embedder has direct synchronous access to the iframe's contentWindow right after the element is added to DOM and before the content scripts run inside at document_start. This eternal architectural bug affects all browsers, AFAIK, and was recently re-reported for Chrome in https://crbug.com/40202434. Resisting it is hard, especially now that synchronous mutation events are disabled in Chrome. You'd have to patch HTMLIFrameElement.prototype.contentWindow/contentDocument getters to process the case of an iframe inside a closed shadow DOM that doesn't expose the child window as window[0], but these getters won't help with light DOM as the page can just read window[0] (it's not using a getter), so I guess you'd have to also patch all DOM prototype methods that can add an iframe like appendChild, append, prepend, before, after, insertAdjacentElement/HTML and innerHTML/outerHTML setters.

kzar commented 3 months ago

Yea, the contentWindow/contentDocument issue is a pain for extensions as well because websites sometimes abuse it to access unwrapped APIs. In the Adblock Plus injected wrapping code we also took care to also wrap contentWindow/contentDocument, so that our injected code could recursively inject itself into frames as contentWindow/contentDocument were accessed. There were likely more ways around it that we missed though!

tophf commented 3 months ago

BTW in addition to all DOM methods I listed above you'll also need to spoof window.open for same-origin document because the creator can get the original stuff directly from the new window object and save them inside that window to be used later inside and/or outside.

weizman commented 3 months ago

Really interesting you all mention these, because that's exactly what we've been attempting to address with the Snow project - tap into all the JS APIs that can grant you access to new same origin realms (iframes, tabs, etc) and tame them to your liking.

Getting a sense of what I'm trying to say can be very easy by visiting the Snow demo app - give it a go.

I'm telling you this because after working on Snow for over 2 years, we arrived at the conclusion that implementing a solution to this virtually is somewhere between super hard and impossible (check out some of the open issues against the Snow project), and instead we are now trying to convince the industry to make this a builtin solution in the browser.

I'd hate to take this discussion to the wrong direction, so let me just mention that if you care about this problem too, feel free to participate on https://github.com/WICG/proposals/issues/144 - the more hands we get, the more likely the community will adopt this effort!

By reading all comments starting with https://github.com/tc39/proposal-shadowrealm/issues/406#issuecomment-2155407656 - I'm pretty sure https://github.com/WICG/proposals/issues/144 will address your concerns.

kzar commented 3 months ago

Thanks, you're right the RIC proposal does look interesting. I've left a quick comment about this use-case there.