w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
579 stars 50 forks source link

Proposal: Declarative Cosmetic Rules #362

Open 105th opened 1 year ago

105th commented 1 year ago

DeclarativeCosmeticRules API proposal

Background

Cosmetic rules in content blockers

Cosmetic rules can be divided into three groups: element hiding rules, CSS rules, and scriptlets.

Main issues

How cosmetic rules are applied in MV2 and MV3?

MV2

The extension needs to inject scripts and styles as early as possible for a smoother user experience (e.g. blinking DOM elements). It also needs to patch scripts before websites can copy DOM API methods. This forced the extension to use a rather sophisticated way of injecting scripts and styles based on events thrown by the webRequest and webNavigation APIs. In short, at webRequest.onHeadersReceived, when the first information of the request is received, the extension asks the engine for the rules related to the current request and prepares styles and scripts to inject. As the engine is already running, this information can be obtained very quickly. At webRequest.onResponseStarted, the extension tries to inject the scripts received in the previous step using tabs.executeScript. This event is not reliable, so at webNavigation.onCommitted the extension will inject scripts again if they weren't injected before. Along with the scripts, the extension will also inject CSS styles using tabs.insertCSS.

So to inject cosmetic rules we have to ask for the next permissions:

MV3

Extensions built on top of MV3 injects scripts using scripting api and content script for styles. To inject scripts extension subscribes to the webNavigation.onCommitted event and injects scripts when this event fires. To inject styles extension uses content script. The content script is injected into every page and requests for the styles from the background page via messaging.

So to inject cosmetic rules we have to ask for the next permissions:

Why not use a content script to inject the cosmetic rules?

In order to insert styles and scripts selectively, we need to launch the engine to search for the rules suitable for this website only. Launching the engine takes some time, if the engine is used in the content script it would be launched for each website separately. This would lead to significant performance degradation due to large script compilation containing a lot of rules. Alternatively, the engine could be launched in the background page or service worker, but this would still require time for messaging between the background page and the content script.

How many cosmetic rules are there?

Element hiding rules are one of the most popular rule types - for example, AdGuard's Base filter contains 98500 rules, 24800 of which are element hiding rules.

CSS rules and scriptlets are less common. However, they are still very popular among filter developers, especially in some difficult cases. Scriptlet rules make up 3000 rules and cosmetic CSS rules make up 1500 rules in the AdGuard Base filter.

Goal

MV3

One of the goals of MV3 is to make extensions have fewer permissions by default, and to make maximum permissions optional.

Proposal goal

The goal of this proposal is to make cosmetic rules declarative. This will allow us to remove the tabs and webRequest permissions from the extension manifest. This will also allow us to remove the <all_urls> permission from the extension manifest. Finally, it would allow us not to inject content script into every page.

To avoid reinventing the wheel, we took the Declarative Net Request API as an example, and tried to build logic on its likeness to take advantages of pre-built Declarative CSS rules.

And as a DNR API we need the ability to dynamically change these rules (https://github.com/w3c/webextensions/issues/162) - for CSS rules it's doubly important.

API

This section needs to be improved and expanded, but first we want to get feedback on the general idea.

API schema

/**
 * "hide" - hides the element with the selector;
 * "css" - applies CSS properties to the selector;
 * "scriptlet" - execute specified scriptlet.
 */
type RuleActionType = 'hide' | 'css' | 'scriptlet'; // | ... to extend

type Rule = {
    /**
     * What type of action should be applied.
     */
    action: RuleAction,

    /**
     * The condition of matching to the hiding rule.
     */
    condition?: RuleCondition,

    /**
     * A list of CSS rules to apply to the element.
     *
     * {@link https://developer.mozilla.org/en-US/docs/Web/API/CSSStyleDeclaration}
     */
    css?: CSSStyleDeclaration

    /**
     * Information about the scriptlet to execute the JS rule
     */
    scriptlet?: ScriptletInfo
};

type RuleAction = {
    type: RuleActionType,
    selector?: string,
};

type ScriptletInfo = {
    name: string,
    args: string[],
};

type RuleCondition = {
    /**
     * List of domains where the action should be applied.
     * If this field is omitted, the rule will be applied to all domains.
     */
    domains?: string[],

    /**
     * List of domains where the action should not be applied.
     */
    excludedDomains?: string[],
};

Declarative element hiding rules

Here and below you will find some examples of its use.

See - https://adguard.com/kb/general/ad-filtering/create-own-filters/#cosmetic-elemhide-rules

/**
 * Generic hiding rule e.g. - "##selector"
 */
const genericHidingRule: Rule = {
    action: {
        type: 'hide',
        selector: 'selector',
    },
    // No condition means the rule is applied to all domains.
};

/**
 * Generic hiding rule with exclusion e.g. - "~foo.com##selector"
 */
const genericHidingRuleWithException: Rule = {
    action: {
        type: 'hide',
        selector: 'selector',
    },
    condition: {
        // No domains means rule applies to all domains except those listed in excludedDomains.
        excludedDomains: ['foo.com'],
    },
};

Declarative css rules

See - https://adguard.com/kb/general/ad-filtering/create-own-filters/#cosmetic-css-rules

/**
 * #$#.textad { visibility: hidden; } - hides '.textad' on all sites via CSS,
 * but not removing from the DOM.
 */

const hideElementRule: Rule = {
    action: {
        type: 'css',
        selector: '.textad',
    },
    css: {
        visibility: 'hidden',
    }
};

Declarative scriptlets rules

See - https://adguard.com/kb/general/ad-filtering/create-own-filters/#scriptlets

/**
 * example.org#%#//scriptlet("abort-on-property-read", "alert") - do not allow usage of window.alert on the example.org site.
 */

const hideElementRule: Rule = {
    action: {
        type: 'scriptlet',
    },
    scriptlet: {
        name: 'abort-on-property-read',
        args: ['alert']
    }
};

API to manage rules dynamically

// TODO

oliverdunk commented 1 year ago

Thanks for writing this up! It definitely seems like a use case that we haven't fully solved yet, and I'm looking forward to continuing to discuss it.

I came across this page which helped me understand the motivation for CSS rules beyond element hiding: https://adguard.com/kb/general/ad-filtering/create-own-filters/#cosmetic-css-rules

erosman commented 1 year ago

Per-site CSS rules was once implemented but later deprecated.

@document

The @document CSS at-rule restricts the style rules contained within it based on the URL of the document. It is designed primarily for user-defined style sheets, though it can be used on author-defined style sheets, too.

Rules could be applied with url(), url-prefix(), domain(), media-document(), and regexp().

Firefox supported above initially under @-moz-document.

See also: Per-site user style sheet rules

ameshkov commented 1 year ago

Let me please address a few comments from the meeting minutes.

I am much more concerned about scriptlets than about CSS rules and the reason is simple: using scriptlets is the only way to get rid on many websites, the most prominent one is Youtube.

Scriptlets

[rob] A CSS selector can easily match everything; how would that reduce the required permissions? Effectively the proposal with JS would execute JS everywhere.

@Rob--W Regarding JS, please see the explanation below, we do not suggest allowing arbitrary JS.

[simeon] In the proposal as written, there are placeholders for scriptlets, but nothing in the API to register scriptlets. But as Tomislav mentioned, it's probably best to defer the scriptlets to the future.

@dotproto good catch, the proposal indeed does not mention one of the main points. We do not propose to allow developers register scriptlets. On the contrary, scriptlets should only be provided by the browsers themselves, this is the only way make it safe to use.

Kind of like what Mozilla does with shims used by tracking protection: https://searchfox.org/mozilla-central/source/browser/extensions/webcompat/shims

We once opened a similar feature request for WebKit, it explains why they're required and I still hope WebKit devs will get back to this and consider it: https://bugs.webkit.org/show_bug.cgi?id=225861

Note, that a scriptlet can come with a set of limitations. For instance, set-constant does not allow setting arbitrary values, only numbers/booleans.

CSS

[rob] A CSS selector can easily match everything; how would that reduce the required permissions? Effectively the proposal with JS would execute JS everywhere. [timothy] It is part of our Content Blocking API, a display:none CSS rule can be applied if the domain, etc. matches. We restrict it to display:none for privacy reasons, anything more, even color changes is not possible. The implementation is optimized using the same mechanism that we also use to block network requests (and backs Safari's declarativeNetRequest API). [timothy] I wouldn't want to support arbitrary CSS without additional permissions. If visibility:hidden is common we can consider that, but anything more than that or display:none.

@Rob--W @xeenon @dotproto

Those are all valid points, arbitrary CSS can indeed be dangerous.

Our own use case is rather limited and does not require CSS to be arbitrary, a subset of allowed CSS properties would suffice.

Here're some examples:

[simeon] It's a bit trickier than that. This was a consideration for CSP in content scripts in Chrome. One of the concerns with remote CSS is data exfiltration through selectors matching input fields for example. Just worth noting that arbitrary CSS can be more dangerous than it seems.

@dotproto Could it be that you're talking about using content property in addition to these selectors or maybe background, etc? The point is that there is only a limited number of properties that can be used to exfiltrate arbitrary data and they can be restricted in the API spec.

Permissions

[simeon] Curious about browser vendors' perspective. Some DNR actions (block, upgradeScheme) do not require host permissions, but others (modifyHeaders) require host permissions. Should this pattern be followed?

@dotproto the problem with this point is that when an extension has host permissions, it can achieve the same result with a content script. With DNR the situation is different, we don't have any alternative way to implement the required functionality in an MV3 Chrome extension.

Other stuff

[rob] Chrome's DNR API automatically hides some elements (e.g. images) when a request is blocked. How does that work in Safari, and how would that play with this API? [timothy] Not aware of that. Safari does not do that. [rob] Firefox's DNR implementation does not do that either.

@Rob--W @xeenon This behavior was one of the first things requested from Chrome team when DNR was introduced. Please consider doing that.

ameshkov commented 1 year ago

We discussed this during the previous meeting and I was asked to provide some scriptlets examples.

First of all, regarding scriptlets, we propose for browsers to provide a small library of declarative "shims" that will be injected into the page. The pages where they will be injected should be defined in a declarative way with an API similar to DNR or maybe the DNR itself.

In AdGuard a scriptlet rule looks like this:

domain1.com,domain2.com#%#//scriptlet("scriptlet name", "argument1", "argument2")

uBlock Origin uses a similar concept but with a slightly different syntax:

domain1.com,domain2.com##+js(scriptletName, argument1, argument2)

Here's a list of scriptlets which cover ~80% of existing rules (each linked to its description):

  1. set-constant
  2. json-prune
  3. abort-current-inline-script
  4. abort-on-property-read
  5. abort-on-property-write

Real life examples

That there are thousands scriptlet rules in AdGuard and uBlock Origin filters, here are just a few examples. Please let me know if you need more.

json-prune

youtube.com,youtube-nocookie.com##+js(json-prune, [].playerResponse.adPlacements [].playerResponse.playerAds playerResponse.adPlacements playerResponse.playerAds adPlacements playerAds)

YouTube loads video metadata JSON alongside ads metadata in a single request. This rule removes parts of the JSON that contain ads meta. The json-prune scriptlet overrides two functions in order to intercept those JSON's:

set-constant

youtube.com,youtube-nocookie.com##+js(set-constant, ytInitialPlayerResponse.adPlacements, undefined)

When you load a YouTube page with a video for the first time, there's a JSON object ytInitialPlayerResponse initialized inside an inline script. This object contains ads metadata which this rule removes.

abort-on-property-write

[many domains...]#%#//scriptlet("abort-on-property-write", "_pop")

Aborts a popular script for popup domains. They use random domains and this scriptlet takes care of it for good even when domain is not blocked yet.

Example: gledajcrtace.xyz

abort-on-property-read

[many domains...]#%#//scriptlet("abort-on-property-read", "BetterJsPop")

Aborts another very popular script to show popup ads. Usually, used as an inline script.

Example: https://upvideo.to/v/jfiqnfdkwqpd

Questions

  1. What are your general thoughts about adding cosmetic rules?
  2. In which way the proposal should be changed? This question basically boils down to one: should it be a part of DNR or should it be a separate API?
zombie commented 10 months ago

Mozilla is generally in favor pursuing this, while understanding that there's lot of details here that need to be worked out. At least the simpler/safer CSS part, and splitting the script part into a separate issue.

oliverdunk commented 10 months ago

We're definitely interested in this from the Chrome side as well - although it may not be something we work on short term. At the moment it feels like this would make more sense as a separate API vs. an addition to DNR, since this does not operate at the network level and likely has some different requirements. That's something we can figure out though as we build up some use cases and desired functionality.

ameshkov commented 3 months ago

The issue was discussed during the WECG in-person meeting.

Apple folks would like to write a formal proposal.

ameshkov commented 3 months ago

Forgot to add one more thing that was also discussed.

Chrome's stance on this issue is basically: "we like it, but we don't have resources to implement it short term".

Once the formal proposal is there, we (AdGuard) want to write a cross-browser polyfill of this new API so that developers could already familiarize with it. Of course unlike the proposed API the polyfill would require extensive permissions.

Yuki2718 commented 1 month ago

Whether this is implemented or not, we need a way to quickly and dynamically update cosmetic and scriptlet filters when problems happen or ads are slipped on very popular sites like Youtube, Twitter, Facebook etc.. For example, Twitter has changed their domain to x.com and while it's no problem at all to users of MV2-blocker, those of uBOL are suffering ads. https://github.com/uBlockOrigin/uAssets/issues/23732#issuecomment-2117977871

oliverdunk commented 1 month ago

@Yuki2718, thanks for flagging that. I definitely think we would want to support dynamic cosmetic rules in line with the dynamic ruleset support we have in the Declarative Net Request API.

Short term, do you know what options uBOL has tried in Manifest V3? For example, an option mentioned at the start of this issue is using messaging from the content script to the service worker to get additional rules. This can't be done synchronously, which is why I still think a new API would be helpful long term - however the injection in MV2 also wasn't fully synchronous and I suspect that at least for additional rules added dynamically (and in particular for modals like the x.com one that aren't present on page load) it may be sufficient.

Yuki2718 commented 1 month ago

Sorry IDK, @gorhill will know better. Sure, MV2-blocker also requires manual refreshing to apply updated filters, but anyway user don't need to wait for the update of extension itself, which is my main point.