Declarative Net Request proposal: disable individual static rules

felipeerias commented 2 years ago

This issue is a proposal to extend the declarativeNetRequest API to add a way to disable an individual malfunctioning rule that is part of one of the static rulesets bundled with an extension.

This would allow extension developers to react in a timely manner to static rules with undesirable behavior, without hampering the functionality of the extension or risking unwanted side effects.

Motivation

Manifest v3 extensions may include static rulesets that will only be updated when the extension itself is updated. From time to time, these static lists might include rules that are found to have undesirable behaviour (either because of malice or human error).

While session and dynamic rules can be added and removed one by one, static rules can only be enabled and disabled along with every other rule in their ruleset. Therefore, the only solutions provided by the current API are to either:

disable the entire ruleset, breaking the functionality of the extension, or
wait until the next version of the extension is released, which could take weeks.

A possible workaround might be to add a dynamic rule which overrides the malfunctioning static rule, but doing that risks causing unwanted side effects as the new rule could conflict with other (correct) static rules.

Proposal

Since extensions are already able to review the contents of their static rulesets, it would be enough to add a way in the declarativeNetRequest API to disable a static rule (or a collection of them).

My proposal is that a collection of disabled rules is added as an optional field to UpdateRulesetOptions:

UpdateRulesetOptions.disableRules of type DisableRulesInfo[]

where

DisableRulesInfo represents the rules that must be disabled for a given ruleset, with the following properties:
- rulesetId : string representing the id of an enabled static ruleset;
- disableRulesIds : long[] containing the IDs of the rules in that ruleset to be disabled.

A main reason for choosing this approach over other alternatives (e.g. a separate method) is that it allows the result of calling updateEnabledRulesets to be a predictable atomic change of the state of the API, without depending on its previous state.

While not being strictly necessary to fix this issue, it would be a good idea to also add a method getDisabledRules() that returns the collection of rules that have been disabled in the currently active rulesets. Together with getEnabledRulesets() and other methods this would provide a way to know the detailed state of the API at any given time.

References

This issue was reported as Chromium bug 1225229
Chrome Extension API proposal (Google docs)

hfiguiere commented 2 years ago

Have two questions:

What effect would disabling a rule have on the rule count? Does disabling a rule free up a slot in the global static rule count?
How persistent would the change be? Are the disabled rules persisted across sessions, until the next extension update (like the set of enabled static ruleset) or does this have a different lifetime?

felipeerias commented 2 years ago

@hfiguiere

From my point of view, the disabled rule should not count towards the global static rule count. As part of working on the implementation, I am checking whether and how this can be done.

A rule should stay disabled until the next extension update, just like the static rulesets. This would prevent a rule from accidentally becoming enabled when it shouldn't be.

The fact that disabled static rules would persist across sessions is a good reason for having a method getDisabledRules() that would return the static rules that are currently disabled, so the developer has a convenient way to check the current status of the extension.

felipeerias commented 2 years ago

I have updated the Extension API Proposal document to use the appropriate template, and have also expanded and clarified its content:

https://docs.google.com/document/d/1NTOTr6iwm0dJbewWjnABmPo6h1QD1mKTpX60s6Klj-8/edit?usp=sharing

felipeerias commented 2 years ago

An initial implementation of this proposal on Chromium is available here: https://chromium-review.googlesource.com/c/chromium/src/+/3494984

This code works, although there is still work to do in terms of testing and benchmarking. Since the new API is still being discussed, at the moment my intention is to use this implementation as a way to check the feasibility of the proposal and explore its possible implications.

felipeerias commented 2 years ago

This is a collection of real-world examples where a problem discovered in a static list of rules would not have been easily solved with the functionality provided by Declarative Net Request at the moment.

These examples are mainly instances of general rules, those that are not tied to a particular domain. Because these rules are applied to lots of different websites, it is not possible to predict with confidence the effects of overriding them with a dynamic rule.

The typical pattern is that when a general rule is faulty because it blocks valid content, its hypothetical solution with the current Declarative Net Request API (implemented as a dynamic rule) would have the opposite effect and potentially allow unwanted or harmful resources.

Example 1

This rule was intended as a comment but the leading ! was missing (commit):

media

(Block content with media).

This led to having a filter in EasyList that blocked all legit (non-ad and non-tracking related) requests containing “media”, e.g. all images from wikipedia which come from upload.wikimedia.org.

It was quickly discovered, fixed in EasyList and deployed to users within hours via automated filter list updates in the extensions.

As static filters currently can’t be disabled, a dynamic rule would be required in order to allow everything with the word "media". But this would not only unblock requests affected by the accidental “media” filter, but would allow ALL - including ad and tracking related - requests, which would have a major impact on ad blocking users.

Example 2

This line was accidentally added to EasyList (commit):

(Block content with u in its URL).

The title of the related issue is self-explanatory: "EasyPrivacy breaking nearly every website".

The bug was fixed by removing the faulty line (commit).

Since this filter blocks far too many desired resources, trying to override with a dynamic rule would have the opposite consequence of allowing too many unwanted resources.

Example 3

This filter broke different sites (added, adjusted):

/ad/css/*

(Block content with /ad/css/).

The solution was to remove it (commit).

While this filter didn’t target .css files specifically, it turned out to be able to block legitimate stylesheets on several websites.

A dynamic rule that simply tried to override it would be too broad, as it would probably allow unwanted content.

At the same time, adjusting that dynamic rule to affect only specific domains would not be a sustainable strategy, as it would mean updating the rule every time that a new website is found to have been affected by the original filter (which was shown to be able to break more than one site).

Example 4

This filter removed legitimate content on this website (commit):

||b-cdn.net/media/cb/

(Block content at b-cdn.net with a path that starts with /media/cb/).

The solution was to remove the filter (commit).

This domain happens to be a CDN. so it is not possible to guess what else is or will be hosted on it. Therefore, a dynamic rule allowing everything with that URL pattern would not be advisable.

Example 5

A developer on HubSpot complained about this filter because it was blocking internal backend calls to app.hubspot.com/ads-list-seg (discussion, commit)

.com/ads-

(Block content with .com/ads-).

The solution was to remove the faulty filter (commit).

However, a dynamic rule that tried to override the faulty filter by allowing everything with .com/ads- would be too broad.

Example 6

This rule broke comments in YouTube (discussion):

/adscript.

(Block content with /adscript.).

The solution was to remove it (commit).

A dynamic rule that overrided it by allowing everything with /adscript. would be overly broad.

Example 7

This rule was too generic and broke valid locations:

/affiliation/*$domain=~esi.evetech.net

(Remove content with /affiliation/ in its URL, except in one domain).

The solution was to remove it (discussion, commit).

A dynamic rule that simply allowed any URLs with the form /affiliation/* would be likely to allow unwanted content and cause other side effects (for example, if only a portion of the rules intended for a given site are actually working as intended).

Example 8

These two rules broke images in anilist.co

/banner/468
/banner/700

(Remove content with /banner/468 or /banner/700 in its URL).

The solution was to remove them (issue, commit).

However, a dynamic rule that allowed all content with those two patterns would be to allow things that would be otherwise have been blocked by other rules.

Example 9

Trying to fix Rotten Tomatoes, this rule broke the internal processes of NBC, a content provider (discussion).

&adunits=$xmlhttprequest

(Block XmlHttpRequest calls containing &adunits=).

The solution was to remove this rule and add a more specific one to easyprivacy/easyprivacy_specific.txt (commit).

A dynamic rule to achieve a similar result would need to start by allowing requests containing &adunits=, which seems overly broad.

Example 10

This rule broke valid content in aliexpress.com:

-adv.jpg

(Block content that contains -adv.jpg).

The solution was to remove it and replace it with more specific rules (commit).

A dynamic rule to achieve the same would need to allow all content with -adv.jpg.

felipeerias commented 2 years ago

Frequency of updates

I have been talking with some ad-blocker developers in order to get a better understanding of their use case.

Currently, they update their main filter list every hour. Other filter lists are updated every 24 hours.

If we decided to set a rate limit for this API, a time interval between 1 and 24 hours might be a reasonable starting point.

A rate of once per hour would allow extensions to preserve their current user protection while not imposing an excessive load on the system.

(Of course, this does not mean that the API would be called every hour, but rather that the extension would not need to wait for long before disabling static rules when needed).

felipeerias commented 2 years ago

Benchmarking

This entry contains some benchmarks to understand the potential impact on performance. See also the benchmarks in bug #1209218.

Materials

An initial implementation in Chromium of this API is available at https://chromium-review.googlesource.com/c/chromium/src/+/3494984

The extension used for benchmarking includes ten filter lists (based on EasyList), each one with 33,000 rules. When all ten are active, the number of enabled static rules is 330,000 which is the maximum allowed in the Declarative Net Request API.

I have used my development machine for the tests (2 x Xeon 4210R, 64GB memory). It would be a good idea to carry out additional benchmarks in more typical hardware/conditions.

Metrics

Two different time metrics are used:

Javascript time: this measures the time spent in the whole API call until the point where the completion callback gets called. It is calculated like this:

let t0 = Date.now();
chrome.declarativeNetRequest.updateEnabledRulesets({...}, () => {
  let t1 = Date.now();
  // measured JS time : t1 - t0
});

Internal time per ruleset: this measures the time spent on indexing and storing one individual ruleset. It is obtained from an internal timer in file_sequence_helper.cc.

void OnIndexCompleted(RulesetInfo* ruleset,
                    base::OnceClosure done_closure,
                    IndexAndPersistJSONRulesetResult result) {

// the measured internal time is in result.index_and_persist_time

Benchmark 1: enable all rules

Enable 10 rulesets with 33,000 rules each.

Total: 330,000 enabled static rules (maximum allowed by the API).

Javascript time: 3,300 msecs.

Internal time per ruleset: 140 msecs on average.

Benchmark 2: disable 1% of the rules

Enable 10 rulesets with 33,000 rules each, but disabling 1% of the rules in each ruleset.

Total: 326,700 enabled static rules.

Javascript time: 3,341 msecs.

Internal time per ruleset: 142 msecs on average.

Benchmark 3: disable 10% of the rules

Enable 10 rulesets with 33,000 rules each, but disabling 10% of the rules in each ruleset.

Total: 297,000 enabled static rules.

Javascript time: 3,354 msecs.

Internal time per ruleset: 139 msecs on average.

Conclusions

The number of disabled rules does not seem to have a noticeable impact in performance.

Note that compiled static rulesets are stored on disk (regardless of whether some of their rules have been disabled) and will not need to be recompiled unless their disabled rules change.

Therefore, the main factor affecting performance will probably be the number of times that a ruleset needs to be recompiled to disable some of its rules, rather than the number of those disabled rules.

felipeerias commented 2 years ago

Adding the Chrome: follow-up label to request feedback on the Chrome Extension API proposal (Google docs) and the initial Chromium implementation.

felipeerias commented 2 years ago

The test extension used to create the benchmarks above is available at felipeerias/DNR-disable-rules-extension.

dotproto commented 2 years ago

Thanks for adding the "Chrome: follow-up" label. I'm planning to sync with folks this week. Apologies for the delay on providing you with feedback.

felipeerias commented 2 years ago

updateEnabledRulesets already allows the developer to enable or disable a single ruleset without affecting the others:

chrome.declarativeNetRequest.updateEnabledRulesets({
    enableRulesetIds: [ "myRules" ]
}, … );

Likewise, the proposed API can be used to set the disabled rules in a ruleset without affecting the others. For example, this call would enable "myRules" (if it wasn't already) and disable some of its rules:

chrome.declarativeNetRequest.updateEnabledRulesets({
    enableRulesetIds: [ "myRules" ],
    disableRules: [
        { rulesetId: "myRules", disableRulesIds: [ 2, 3, 4 ] }
    ]
}, … );

This is already included in the ongoing Chrome implementation. To prevent errors, a ruleset must be explicitly listed in enableRulesetIds for any changes to its rules to take effect.

Question: is this method discoverable enough?

Question: would it be necessary to add methods that work on only one ruleset at the time? For example, we could have a simpler way to disable rules in a single set with:

setDisabledRules( "myRules", [ 5, 6, 7 ]) ;

dotproto commented 2 years ago

Noting for visibility that Devlin from Chrome's extensions team has provided some feedback on the design doc, so I'm removing the Chrome feedback label for the time being.

felipeerias commented 2 years ago

Following the feedback over the past weeks, we have updated the API proposal document with new content. This is a quick summary:

The proposed API consists of two new functions:
- updateStaticRules() disables and enables individual rules in a given static ruleset
- getDisabledRules() returns the list of disabled rules in a given static ruleset.
Two more use cases have been added:
- Extensions might want to match the latest upstream versions of their lists: rules that have been removed upstream would be disabled in the extension, while new ones would be added as dynamic rules.
- Some extensions currently provide the user with fine-grained control over individual rules and they might want to keep that functionality.
Three different implementation approaches are discussed:
- Recompiling the entire ruleset to exclude disabled rules.
- Using a mutable flatbuffer so each rule can have a flag indicating whether it is enabled or not; disabling a rule would require just changing that flag instead of the whole file.
- Storing the list of disabled rules separately from the flatbuffer file, and passing the responsibility of ignoring disabled rules to the matching algorithm.

felipeerias commented 2 years ago

We have continued working on our API proposal and initial implementation.

API proposal

In a nutshell, the developer would be able to modify the disabled rules for a given ruleset with a call like

chrome.declarativeNetRequest.updateStaticRules(
    {
        rulesetId: "myrules",
        disableRuleIds: [ 1, 2, 3 ],
        enableRulesIds [ 5, 6 ]
    },
    () => { /* optional, called once the update is complete */ }
);

And they would also be able to get the list of currently disabled rules in a given ruleset with:

chrome.declarativeNetRequest.getDisabledRules( "myrules",
        ( disabledRules ) => { /* process the returned list */ }
);

Implementation of disabled rules

We are storing the collection of disabled rules separately from the complied rulesets (which therefore remain immutable).

The responsibility of ignoring disabled rules is passed to the matching algorithm. The stored disabled rules are loaded in a std::unordered_set<uint32_t> for constant lookup time. Early benchmarks suggest that this solution maintains good performance even with a large number of disabled rules.

Storage

At the moment, these rules are stored in the Preferences along with data from many other components of the Chromium browser.

We are currently looking at an alternative approach: store the disabled rules in their own separate files, one for each static ruleset.

This would give us more flexibility in terms of the total number of disabled rules without affecting the performance of the rest of the browser. These files could be in the flatbuffer format, for better performance at load time. Disabling additional rules in a ruleset would only require updating the file corresponding to that particular ruleset, without affecting the others.

Links

API proposal document
Ongoing implementation work: https://crrev.com/c/3868630

(Adding the follow-up: chrome label)

felipeerias commented 2 years ago

My colleague @byung-woo has been working on a CL that implements our current API:

https://chromium-review.googlesource.com/c/chromium/src/+/3893877

Implementation details

While this is still a work in progress, it is worth describing our current approach in more detail:

the compiled rulesets in flatbuffer format are immutable;
the list of disabled rules for each ruleset is stored in a separate JSON file, which is updated via chrome.declarativeNetRequest.updateStaticRules();
that list of disabled rules is retrieved when the corresponding ruleset is loaded via chrome.declarativeNetRequest.updateEnabledRulesets()
these lists are held by the rule matching algorithm; whenever a match is found, the algorithm will check if the matching rule has been disabled before returning a result.

Benchmarks

The following measurements were taken on my dev machine and correspond to 1 ruleset with 20,000 rules.

The time to complete chrome.declarativeNetRequest.updateStaticRules() grows slowly with the number of disabled rules: disabling 2,000 rules (10% of the ruleset) is roughly twice as slow as disabling just 200 rules (1% of the ruleset).

dNR-check-update-static-rules-performance_2000

Loading a ruleset with disabled rules shows good performance in absolute terms. A ruleset with 2,000 disabled rules will take 40% more time to load via chrome.declarativeNetRequest.updateEnabledRulesets(), but (in my machine) this difference is actually just 3 milliseconds.

dNR-check-update-enabled-rulesets-performance-after-static-rules-updated_2000

Adding a check for disabled rules to the matching algorithm does not seem to have a significant impact in performance, although we are still looking for the right benchmarks to corroborate this. We store the list of disabled rules in an std::unordered_set<int> for constant lookup time and only check if a rule is disabled after a successful match (since the large majority of rules will fail to match a particular element on a website).
Benchmarking extensions: dNR-check-update-static-rules-performance, dNR-check-update-enabled-rulesets-performance-after-static-rules-updated

felipeerias commented 2 years ago

Implementation update

My colleague @byung-woo has been working on a set of CLs that implement our API proposal:

Rule matching performance

The following measurements were taken on my dev machine and correspond to 1 ruleset with 20,000 rules. In each case, up to 2,000 rules are disabled (10% of the set).

Match performance has been measured using chrome.declarativeNetRequest.testMatchOutcome() with different numbers of disabled static rules. This method checks whether a given request is matched by any of the active rules.
Average time to complete testMatchOutcome() for a request that is matched by the first rule in the set:

benchmark - always match first rule 2000

Average time to complete testMatchOutcome() for a request that is matched by the last rule in the set:

benchmark - always match last rule 2000

Average time to complete testMatchOutcome() for a request that is not matched by any rule (note the similarity with the previous):

benchmark - never match any rule 2000

Conclusion

The time to examine a single request with testMatchOutcome() does not change significantly as the number of disabled rules becomes very large.
In light of these findings, we don't see a need for limiting the maximum number of rules that may be disabled by this API.
Of interest for future benchmarks: the main factor influencing the performance of testMatchOutcome() is whether the request is matched by a rule or not and, in the case where it is matched, whether that rule is located towards the beginning or the end of the set. The reason is that these factors determine the number of rules that will need to be tested.
Benchmarking extension: dNR-check-match-performance-after-static-rules-updated

byung-woo commented 2 years ago

FYI, I made a new CL by following the recent feedback from Devlin, and waiting for the review:

https://crrev.com/c/3954955

The new CL has the same matching logic in the previous CL(https://crrev.com/c/3900106), so it gives same performance result.

There seem to be three points in the Devlin's feedback (feedback 1, feedback 2):

Store the disabled rule ids with a limit of 5k rules.
Make the API argument an object so that it have a flexibility to a future argument change.
Change the API argument so that we can update multiple rulesets and get all disable rule ids.

I don't have any doubt on the 1st and 2nd points, but I'm not sure that the API should handle multiple rulesets within a single API call.

The 3rd point seems to guarantee the atomicity across multiple rulesets. (means, all updates in the argument will be discarded when any error occurs while updating multiple rulesets)

If that is the case, the implementation need to consider how to make the updateStaticRules() atomic across multiple rulesets. For example, we need to store the disabled rule ids of all static rulesets in a single file and re-write the disabled rule ids of all static rulesets whenever the updateStaticRules() called regardless of the number of rulesets the method call affects.

Anyway, it's not a big problem right now with Prefs because the Prefs already stores the values into a single file. So on top of the new CL, I made additional CL (https://crrev.com/c/3960116) that applies the 3rd point for the case we should support updating multiple rulesets.

However, IMHO, if the static rules won't have any cross-ruleset dependency in the wild in most cases, it would be better to make the API simple (updating a single ruleset by a single API call).

byung-woo commented 2 years ago

The CL 3954955 was divided into 4 sub CLs to help review: (CL 3954955, CL 3999643, CL 3989814 and CL 4005612)

These are the performance test results with the CLs.

Performance of updateStaticRules() (test extension)
Performance of updateEnabledRulesets() (test extension)
Performance of testMatchOutcome() (test extension)

Same to the previous test results (comment 1248217901, comment 1259707676),

number of disabled rules doesn't affect the performance of enabling/disabling static rulesets or the performance of matching rules.
The time to disable static rules grows slowly with the number of disabled rules.

105th commented 2 years ago

We at AdGuard think the limit should be increased and 5,000 is far from what’s required.

In our understanding one of the main disableRules purposes is implementing differential filter lists updates. We tried to analyze how different filter lists change and from that analysis we conclude that the limit should be increased to 20,000.

This is not enough to consider changes of EasyList alone as popular ad blockers nowadays use more lists by default. Most of AdGuard users have about 5-6 lists enabled and uBlock Origin has 10 lists enabled by default. So we analyzed changes to different filter lists that AdGuard allows users to choose from.

Here are some graphs that show how many rules are getting deleted from these lists. The raw results can be found here.

Popular AdGuard Global Lists (a considerable share of AdGuard users have them enabled).

Filters included

- [AdGuard Base + Easylist](https://github.com/AdguardTeam/FiltersRegistry/tree/master/filters/filter_2_Base) - [AdGuard Annoyances](https://github.com/AdguardTeam/FiltersRegistry/tree/master/filters/filter_14_Annoyances) - [AdGuard Tracking Protection](https://github.com/AdguardTeam/FiltersRegistry/tree/master/filters/filter_3_Spyware) - [AdGuard URL Tracking Params](https://github.com/AdguardTeam/FiltersRegistry/tree/master/filters/filter_17_TrackParam) - [AdGuard Social Widgets](https://github.com/AdguardTeam/FiltersRegistry/blob/master/filters/filter_4_Social)

Number of deleted network rules from AdGuard popular global lists

The graph shows that on average about 2,500+ rules are removed from the popular lists every week. Note, that these graphs only show lists that are popular worldwide. Usually, users also have 1-3 regional lists enabled on top of that.

Now let’s take Chrome’s release cycle which is 3 weeks now as an example. If we are going to have the same release cycle (which is not easy as well), we need to have the ability to apply differential updates within this period. Additionally, let’s consider the CWS review times which from our experience can take up to 3 weeks. With all that in mind we suppose that the reasonable value for disableRules limit is 20,000.

felipeerias commented 2 years ago

@105th Thank you very much for your suggestion and data. Thank you also for opening https://github.com/w3c/webextensions/issues/318 and https://github.com/w3c/webextensions/issues/319.

The current limit of 5,000 disabled rules comes from the implementation approach that we chose in collaboration with the Chrome engineers.

Our implementation keeps the compiled rulesets immutable and stores the lists of disabled rules separately. This removes the need for recompiling flatbuffer files, which is quite expensive. Furthermore, our prototypes showed that this was reasonably simple to implement and maintain. Finally, since the compiled rulesets are predictable, it would be possible to carry out additional checks on them if needed in the future.

As seen in the comments above, the performance impact of this approach is reasonable for a relatively small number of disabled rules and degrades (slowly) as the number of disabled rules increases. This was the first reason that suggested the need for a limit on the number of disabled rules.

Our implementation keeps the lists of disabled rules in Chrome's Preferences file (see: ExtensionPrefs). This particular storage method is meant to support relatively lightweight data of a limited size. Preferences from different components are stored and loaded together, so allowing a huge number of disabled rules could cause performance issues for the browser as a whole. This is the second reason that suggested the need for a limit to the maximum number of static rules that may be disabled.

Please see the section "Additional implementation considerations" in the API proposal document for more details.

In the future, we would like to be able to remove the need for this limitation and already have some ideas about how it might be done.

ameshkov commented 2 years ago

Hi @felipeerias, I'd argue that the 5000 limit was chosen rather arbitrarily and based on incomplete statistics. With such a low limit we'll have to completely give up on the idea of differential updates which was IMO one of the main purposes of this change.

Of course relaxing it to 20k would be an ideal outcome for us, but if this is not technically feasible can we at least consider relaxing it to 10k without changing the implementation? This way we'll have to switch to a 3-weeks release cycle in order to make it all work.

@rdcronin what do you think?

dotproto commented 1 year ago

Quick check in on the progress in Chromium (crbug 1225229).

Comment 15 on Dec 29 is a merge commit that introduces declarativeNetRequest.updateStaticRules() to disable individual static rules.

Comment 16 on Jan 11 is a merge commit that introduces declarativeNetRequest.getDisabledRuleIds() to retrieve a list of disabled rules.

oliverdunk commented 1 year ago

Given there have been changes landed in to Chrome at this point, adding the supportive label.

oliverdunk commented 1 year ago

We can probably move to the implemented label, but I'll wait for an update (I notice we haven't documented it yet so I want to make sure the overall design is settled) cc/ @byung-woo

ameshkov commented 1 year ago

@oliverdunk should we file a different issue for this?

oliverdunk commented 1 year ago

@oliverdunk should we file a different issue for this?

I'm happy to try to get an update on that feedback. I think continuing discussion in this issue makes sense (regardless of what labels it has) but open to thoughts.

ameshkov commented 1 year ago

@oliverdunk hi Oliver, any update on this?

oliverdunk commented 1 year ago

@oliverdunk hi Oliver, any update on this?

Hey! No update yet I'm afraid, thanks for the bump. I do still want to get some thoughts on your question and generally figure out if we are ready to document the new APIs 😄

ameshkov commented 1 year ago

@oliverdunk got it, thank you! If you need any more input from us, please let me know.

oliverdunk commented 1 year ago

Small update (unfortunately nothing specific on increasing the limits yet) - I've confirmed that we're shipping this API in stable and are ready to document it, so adding the https://github.com/w3c/webextensions/labels/implemented%3A%20chrome label.

I did confirm that the current limits were chosen based on everything @felipeerias mentioned here. In particular, there is a limit to how large we want the data stored in preferences files to be. I understand that doesn't help with solving the differential updates use case though so definitely still something to figure out there.

xeenon commented 1 year ago

Safari/WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=261039

JWorthe commented 8 months ago

Hi @oliverdunk. We unfortunately ran into the 5k disabled rule limit during the beta testing of moving Adblock and Adblock Plus to use MV3.

Given the recent change to allow up to 30k safe dynamic rules, I think it might be worth reconsidering increasing the number of disabled rules, depending on what the performance looks like.

The reason is that our main use case for this feature is to update our rulesets to match the latest filters on our servers. If a rule has been changed since we built the extension, we disable the existing static rule, and then add a new dynamic rule with the change. So if we can add 30k dynamic rules but only disable 5k static rules, then the effective number of filters we can change before deploying a new build of our extensions is actually only 5k rules.

It could also help us if this limit were scoped to specific rulesets. So maybe we can disable 5k rules per ruleset, rather than the current limit of 5k rules per extension.

I'd also like to request two quality of life improvements that don't affect the actual limit:

Could this 5k number be exposed as a constant we can refer to, similar to MAX_NUMBER_OF_DYNAMIC_RULES. This would help us write code that adapts in future if the limit is ever increased, or if other browser vendors have different limits.
Please mention this limit in the documentation for updateStaticRules. This unfortunately caught us by surprise until I found this thread. It would have been much better if I could have discovered this by reading the docs.

ameshkov commented 8 months ago

@JWorthe thanks for confirming this issue, as we were telling in the beginning, 5k limit is not enough to implement differential updates.

ameshkov commented 8 months ago

This issue was discussed during the WECG in-person meeting.

Chrome's stance on this is the following: let's see how it goes with CWS fast-track, whether you would still need increasing the constant given that you're going to publish daily updates.

A couple of important points that I missed to mention during the meeting:

Even if we publish updates daily to CWS, it still takes about 14 days to get all of the users updated according to our extension's CWS statistics. This is more than enough to hit the 5k limit if we were to implement differential updates.
The quality-of-life improvements in the comment by @JWorthe.

I think that given how quickly updates are actually delivered to the users it is important to increase the limit regardless of how well fast track works. The current limits make the feature useless for differential updates (which was the original purpose of it).

Initially we requested an increase from 5k to 20k but as far as I understand this would require a considerable change (moving disabled rules from the prefs file to a separate file, parsing, etc.).

Can you consider at least doubling the limit (5k -> 10k)? If fast track works correctly this should be enough to keep every user up-to-date for the 2 weeks period that the full rollout takes.

oliverdunk commented 7 months ago

Hi @JWorthe / @ameshkov - appreciate the feedback. On increasing the limit for disabled static rules, I spoke to the team and we are already towards the upper limit of what we think is reasonable with the current implementation. In fact, storing 5,000 disabled rule IDs in a preferences file feels like a lot and we only went with this knowing it is likely to be used by a small number of extensions.

I spoke to @ameshkov more individually to understand the update rollout times, and it seems most users update quickly, with a long tail that would benefit from a higher limit. Notably however, this long tail represents only a small percentage of users and the update times for these users are very long (multiple months perhaps).

With that in mind, we'd like to see how things go at first, noting that chrome.runtime.requestUpdateCheck is available if you'd like to try and update users more proactively. Once we're able to see how things are going in practice definitely happy to revisit this.

On exposing a constant and updating the documentation - this all seems worthwhile, I'll speak to the team.

ameshkov commented 7 months ago

Got it, thanks for the update @oliverdunk! Not that I am too happy with the decision though, but we'll explore the requestUpdateCheck API, it will be useful regardless, and then see what numbers we're getting.

Rob--W commented 6 months ago

At Firefox we are currently implementing the API to disable individual static rules, at https://bugzilla.mozilla.org/show_bug.cgi?id=1810762 As part of that, I took a closer look at the design here and the implementation in Chromium, and some parts are worth additional commentary, which I have done below.

Disabled individual static rules are still counted to enforce static rule limits

I see a notable difference between an initial design goal ("From my point of view, the disabled rule should not count towards the global static rule count") and the actual implementation (where global limits are independent of disabled individual rules source 1, source 2). These aspects were not mentioned in the external design doc nor during code review. I evaluated the two different designs below, and my conclusion is that disabled individual rules should continue to count towards the global static rule count, i.e. they should not free up quota when an individual rule is disabled. (which matches Chrome's current behavior).

As mentioned before, Chrome does currently exclude individual disabled rules from the quota of available static rules, and the quota of regexFilter rules. A clear "disadvantage" is that this prevents extensions from pushing the limits. However, I would not attribute much weight to this disadvantage, because extensions should not rely on disabling individual rules in order for rulesets to be enabled. Especially for the use case that motivated this API design, which is to disable a few rules that were unexpectedly broken.

The advantage of including disabled individual rules in the quota is that any quota-dependent logic remains stable. When the quotas are independent of the individual disabled rules, then changes to individual rules only affect that ruleset. If disabled individual rules were to affect the quotas, then it is possible that other rulesets become enable/disabled after changes to an unrelated ruleset. Here is an example:

rulesetA - 2 regexFilter rules.
rulesetB - 999 regexFilter rules.
rulesetC - 1 regexFilter rule.

At install:

rulesetA - enabled, total regexFilter = 2.
rulesetB - disabled, because enabling it would result in regexFilter count of 2 + 999 = 1001 (while 1000 is the limit). Total regexFilter is still at 2.
rulesetC - enabled, total regexFilter is now 2 + 1 = 3

Now imagine the scenario where we disable 1 regexFilter rule from rulesetA.

If the quotas were to account for this (i.e. free up quota when a rule is disabled), then toggling an individual rule can have a significant cascading effect:

rulesetA - still enabled, one individual regexFilter disabled, total counted regexFilter = 1
rulesetB - enabled, just enough quota. Total regexFilter 1+999 = 1000
rulesetC - disabled. Although there is only one regexFilter rule, there is not enough quota for any more.

In the above example I only mentioned regexFilter, but the whole ruleset is affected: an over-quota part of a ruleset causes the whole static ruleset to be disabled. This cascading effect is undesirable, hence it makes sense for individual static rules to not be subtracted from the quota.

Invalid rules

A static ruleset can have invalid rules (e.g. invalid actions/condition values). These are ignored, and are not counted towards the limit of the global static rules. Ignored invalid rules are not considered disabled rules either.

Invalid rules are worth a special mention (see also below), because what is considered an invalid rule in one browser version can become a valid rule in an updated version of the browser.

Error handling in `updateStaticRules`

Interestingly, Chrome's updateStaticRules method does not perform any validation, other than confirming that the rulesetId exists and enforcing a maximum number of disabled rules (source).

As mentioned before, disabled individual rules are not subtracted from any static rule quotas. If it were to be, then the method also has to throw an error when quotas are exceeded.

There is no validation on the validity/existence of a ruleId, other than them being integers. I would expect an error to be raised when an invalid rule ID were to be passed. This has some implementation and design complexity though. When the full set of potential rule IDs is not readily available, the static ruleset needs to be read from disk and be parsed. This may seem trivial, but the parser has to be somewhat special: instead of parsing every entry as a Rule type, the parser needs to treat the input as an array of objects, read the "id" field and accept it as a potential ruleId if it is an integer. The reason for this lax parsing is that a presently "invalid" rule can be valid in the future, and updateStaticRules should save the request to disable a rule, even if currently considered invalid, because in a future browser update that "invalid" rule could become eligible for being disabled.

Rob--W commented 6 months ago

@oliverdunk The 5k limit is currently not documented in Chrome's extension API docs, nor exposed through any API. Questions:

are you going to document the limit?
is there an API constant? For example MAX_NUMBER_OF_DISABLED_STATIC_RULES

@xeenon Will Safari limit the number of disabled individual rules? If so, what limit are you considering?

oliverdunk commented 6 months ago

@Rob--W Thanks for the investigation!

Yeah, I'll make a note that we should add this to the documentation. There isn't an API constant as far as I'm aware but I could see that it might make sense to add one.

On everything else - we're chatting about this internally, I'll follow up.

xeenon commented 6 months ago

@Rob--W We haven't looked at implementing this yet, so I don't know what limit Safari might impose.

oliverdunk commented 4 months ago

@Rob--W:

are you going to document the limit?

Yes, we should definitely document this. I'll take a look at that today.

is there an API constant? For example MAX_NUMBER_OF_DISABLED_STATIC_RULES

Not currently. Both me and Devlin are fairly neutral here. I would prefer not to do it since we already have many constants, but I can see an argument it is useful if building a feature that will disable many rules. We are happy to implement this if there is consensus between other browsers.

my conclusion is that disabled individual rules should continue to count towards the global static rule count, i.e. they should not free up quota when an individual rule is disabled

I agree with this and confirmed with Devlin that he feels the same.

A static ruleset can have invalid rules (e.g. invalid actions/condition values). These are ignored, and are not counted towards the limit of the global static rules.

These should probably count towards the quota, since it would be odd for an extension with static rules to behave differently with regards to quota across various Chrome versions. We wouldn't want to change it without checking for usage though, since this could be a breaking change (although I suspect it is unlikely to be in practice).

Interestingly, Chrome's updateStaticRules method does not perform any validation, other than confirming that the rulesetId exists and enforcing a maximum number of disabled rules (source).

Neither me or Devlin have strong feelings here. An argument not to do this is that if an extension is configured to disable several rules at once, and a single rule was invalid, we don't necessarily want to completely discard that call because of the single missing ID.

Rob--W commented 4 months ago

Firefox 128 supports disabling individual static rules through the declarativeNetRequest.updateStaticRules method, and retrieving their rule ids through declarativeNetRequest.getDisabledRuleIds.

We enforce a quota of 5000 disabled rules, counted per ruleset rather than across all rulesets. This limit is exposed in the API as the declarativeNetRequest.MAX_NUMBER_OF_DISABLED_STATIC_RULES constant.

w3c / webextensions