w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
575 stars 50 forks source link

Inconsistency: match_about_blank #575

Open lapcat opened 3 months ago

lapcat commented 3 months ago

In the wild (world wide web), I'm encountering a number of different ways that websites specify an iframe element without a source URL. Typically there would be no src attribute, or src="about:blank". However, some sites are using src="about:srcdoc" or even src="javascript:false". (See forbes.com for an example of the last.) According to a Stack Overflow post,

Standard approach when creating an "empty" iframe (as an iframe shim, for example), is to set the src as javascript:false;. This is the method used by most of the JavaScript libraries that create iframe shims for you (e.g. YUI's Overlay).

There is a manifest.json key match_about_blank, but web browsers are inconsistent in their handling of this key. Chrome loads the content script into about:blank and about:srcdoc but not javascript:false, while Firefox loads the content script into about:blank and javascript:false but not about:srcdoc. (Firefox inserts "Hmm. That address doesn’t look right." into about:srcdoc frames and for some strange reason a literal "false" string into javascript:false frames.) Safari web extensions do not support match_about_blank, while macOS Safari app extensions behave like Chrome extensions. (I realize that Safari app extensions are not under the purview of this group.)

Empty iframes are not simply inert. Web pages are adding and running script elements inside iframes, and some of those scripts even add elements to the main frame. Thus, it's important for extensions to also be able to inject their scripts into those iframes too.

Below is an example to demonstrate. manifest.json:

{
  "manifest_version": 3,
  "name": "iframe bug",
  "version": "1.0",
  "description": "Take back your web browser.",
  "author": "Jeff Johnson",
  "content_scripts":
  [{
    "all_frames": true,
    "js": ["content.js"],
    "matches": ["<all_urls>"],
    "match_about_blank": true
  }],
  "permissions": []
}

content.js:

(function() {
'use strict';

const href = location.href;
console.log("content.js: " + href);

const p = document.createElement("p");
p.textContent = href;
document.body.appendChild(p);

})();

index.html:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>iframes</title>
</head>
<body>
<h1>iframes</h1>
<p><code>no src</code></p>
<iframe title="no src" width="300" height="100"></iframe>
<p><code>about:blank</code></p>
<iframe src="about:blank" title="about:blank" width="300" height="100"></iframe>
<p><code>about:srcdoc</code></p>
<iframe src="about:srcdoc" title="about:srcdoc" width="300" height="100"></iframe>
<p><code>javascript:false</code></p>
<iframe src="javascript:false" title="javascript:false" width="300" height="100"></iframe>
</body>
</html>
tophf commented 3 months ago

The bug/feature in Chrome that skips injection in anonymous javascript: iframes is [ab]used by some extensions in the wild e.g. Violentmonkey/Tampermonkey create such iframe to ensure that other extensions won't run content scripts there. If this weirdness is fixed, it might make sense to provide an API for content scripts to create iframes that won't be injected by other extensions.

P.S. Violentmonkey/Tampermonkey won't need this trick if all browsers fix their injection timing for content scripts at document_start in same-origin iframes and same-origin window.open documents, which is reported in https://crbug.com/40202434 for Chromium/Chrome, but applies to Firefox too, maybe other browsers as well.

lapcat commented 3 months ago

The bug/feature in Chrome that skips injection in anonymous javascript: iframes is [ab]used by some extensions in the wild e.g. Violentmonkey/Tampermonkey create such iframe to ensure that other extensions won't run content scripts there.

This can be bypassed, though. I've already got a way to bypass the javascript: limitation in general, which means that the limitation is basically pointless.

P.S. Violentmonkey/Tampermonkey won't need this trick if all browsers fix their injection timing for content scripts at document_start in same-origin iframes and same-origin window.open documents, which is reported in https://crbug.com/40202434 for Chromium/Chrome, but applies to Firefox too, maybe other browsers as well.

FWIW this bug also applies to Safari app extensions.

tophf commented 3 months ago

This can be bypassed, though. I've already got a way to bypass the javascript: limitation in general, which means that the limitation is basically pointless.

In the case of Violentmonkey/Tampermonkey they remove the iframe immediately (synchronously), so the only way for other extensions to see it is to use the deprecated synchronous mutation events + chrome.dom.openOrClosedShadowRoot as the iframe is inside a closed shadow DOM. These events already generate warnings in Chrome and will be removed in v127, so there'll be no way for other extensions in this case, which is why it might make sense to think of a way to allow extensions to create a non-injectable anonymous iframe.

P.S. Violentmonkey/Tampermonkey won't be affected by this PR, AFAICT, because they remove the iframe before injection occurs in the currently bugged browsers.

xeenon commented 3 months ago

We agreed in the meeting today that the newer match_origin_as_fallback should apply here and match all the cases where the frame origin is not an HTTP-family URL, including javascript:false.

lapcat commented 3 months ago

We agreed in the meeting today that the newer match_origin_as_fallback should apply here and match all the cases where the frame origin is not an HTTP-family URL, including javascript:false.

According to MDN, only Chromium supports this key: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/manifest.json/content_scripts

dotproto commented 3 months ago

@lapcat, I believe MDN is correct. The match_origin_as_fallback property is a relatively recent addition to the extension platform and was created to address the limitations of match_about_blank + all_frames like not being able to inject scripts in data: and blob: URLs. We are currently aligned on the view that match_about_blank should not be expanded to include pages like these because it be a breaking change for existing extensions.

EDIT: That said, I agree that we should collect data on how each browser handles the src strings you've identified and try to align on our handling of them as much as reasonably possible.

lapcat commented 3 months ago

We are currently aligned on the view that match_about_blank should not be expanded to include pages like these because it be a breaking change for existing extensions.

It should be noted, however, that, match_about_blank already includes javascript:false in Firefox, as I mentioned in my original comment:

Firefox loads the content script into about:blank and javascript:false but not about:srcdoc. (Firefox inserts "Hmm. That address doesn’t look right." into about:srcdoc frames and for some strange reason a literal "false" string into javascript:false frames.)

Apparently Firefox doesn't handle about:srcdoc at all, which is tangential to the question of extensions. Safari doesn't currently support match_about_blank, so in a glass half full sense, there's no backward compatibility issue in Safari, and in a glass half empty sense, backward compatibility is already broken in Safari.

In any case, I see that the meeting notes say, "Firefox will implement match_origin_as_fallback soon." That's great, as long as extensions can add both match_about_blank and match_origin_as_fallback to the manifest in a backward compatible way. Currently, about:debugging shows a warning, "An unexpected property was found in the WebExtension manifest", but the warning doesn't seem to affect the functionality of the extension.

oliverdunk commented 2 months ago

Adding follow-up label to check we are aligned with this change and if so track it on our side.

Rob--W commented 2 months ago

FYI I recently posted more context about match_about_blank and match_origin_as_fallback in comments on PR #542

Copy-pasting here for future reference:

match_about_blank

match_about_blank was designed for about:blank and about:srcdoc.

If you're looking for clarity, see https://stackoverflow.com/questions/41408936/can-anyone-explain-that-what-is-the-use-of-match-about-blank-in-chrome-extensi, where I previously posted an answer that describes why match_about_blank exists and what it does.

Other documentation:

match_origin_as_fallback

The semantics have extensively been discussed on Chromium's issue tracker where I and Devlin discussed the API design. If you're interested, the start of the discussion is at https://issues.chromium.org/issues/40443085#comment48. The design that is close to what we have now was sketched in https://issues.chromium.org/issues/40443085#comment61 , with the final name (match_origin_as_fallback) at https://issues.chromium.org/issues/40443085#comment67. Devlin summarized the discussion at https://issues.chromium.org/issues/40443085#comment71

Upon reviewing the proposed texts here, I think that there is some confusion on terminology. The current text mentions blob URLs as an opaque origin, but that is not the case.

Relevant to content script matching is the URL of the document (which can have an origin component) and the origin of the document (as a security principal). There may not always be an obvious relation between the two:

lapcat commented 1 month ago

In my testing, Chrome's match_origin_as_fallback doesn't load content scripts into javascript:false frames, as in my example above. Is there a Chromium bug to fix this?

oliverdunk commented 1 hour ago

I spoke to @rdcronin about this. We agree that we should not make further updates to match_about_blank, and are supportive of matching javascript: schemes with match_origin_as_fallback set to true. I've opened an associated issue here: https://issues.chromium.org/350350577.

We do want to be careful of the potential impact to userscript extensions, and if there is a risk of significant breakage that might be a reason not to immediately fix this.

@tophf, to confirm, it sounds like you don't think Tampermonkey / Violentmonkey will be impacted? Are there any other extensions you are aware of that rely on the same behavior?

tophf commented 1 hour ago

to confirm, it sounds like you don't think Tampermonkey / Violentmonkey will be impacted?

Yes, I expect the browser won't inject content scripts into this iframe because it's already removed from DOM at document_start. In the future this iframe trick won't be necessary after https://crbug.com/40202434 is fixed.