w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
595 stars 56 forks source link

Document DNR's regex validation rules #225

Open 4ntoine opened 2 years ago

4ntoine commented 2 years ago

Background

adblockpluscore and webext-sdk is a chain of SDKs that power few of the most advanced content filtering web extensions in the world. Originally they were designed for web extension Manifest v2 (MV2) and contained Filtering Engine and Matcher that applied ABP filter rules with ABP syntax delivered in subscriptions (eg. here).

With introduction of Manifest V3 (MV3) Chromium migrated to Declarative Network Request approach with internal Content Filtering Engine and rules declared in a proprietary format bundled within a web extension in JSON file(s).

In order to do that adblockpluscore has to convert ABP rules into DNR rules before the web extension is assembled.

One of the matching approach is using regexp rules with regexFilter condition. Chromium uses RE2 syntax and provides isRegexSupported function to validate the regex in browser environment.

Challenge

We need to be able to validate the converted regex rules in non-browser environment (Node.JS) to avoid deploy-time issues like this without having platform-dependent binary code that can be hardly distributed via npm or similar.

While RE2 regex validation itself seems to be doable (probably with help of tools like re2-validator or similar) Chromium adds some specific requirements like per regex memory quota that can be hardly evaluated before the deployment.

Taking into account multiple subscriptions few thousand rules each automated pre-validation sounds hard without clear description of how it can be done.

TODO

Document how regexFilter is validated to let us pre-validate it in non-browser environment (like Node.js script here) or provide a tool/API to pre-validate DNR regexes.

Rob--W commented 1 year ago

I just filed #344 to generalize the request for a specification of the urlFilter format. In its current form, Chrome's regexFilter format is difficult to document, let alone implement across other browsers.