w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
579 stars 50 forks source link

Blocking webRequest use case: DNR unable to properly redirect based on URL parameters #302

Open ghostwords opened 1 year ago

ghostwords commented 1 year ago

It does not appear possible to properly extract and redirect to URL-encoded components of URLs with Declarative Net Request.

For example, an extension may want to "clean" https://www.google.com/url?q= redirect URLs by extracting the value of the q parameter and issuing an internal redirect to that destination URL, in order to avoid unnecessary network requests and reduce data leakage to Google.

This is an important use case for URL cleaning extensions specifically, and privacy extensions in general.

Related to #110.

Demo extension (zip):

manifest.json

{
  "version": "1.0.0",
  "name": "DNR regexSubstitution URI param encoding demo",
  "description": "demo to show inability to handle URI component encoding with regexSubstitution in DNR",
  "permissions": [
    "declarativeNetRequest"
  ],
  "host_permissions": [
    "http://*/*",
    "https://*/*"
  ],
  "declarative_net_request": {
    "rule_resources": [
      {
        "id": "url_cleaning_rules",
        "enabled": true,
        "path": "url_cleaning_rules.json"
      }
    ]
  },
  "manifest_version": 3,
  "minimum_chrome_version": "102.0"
}

url_cleaning_rules.json

[
  {
    "id": 1,
    "action": {
      "type": "redirect",
      "redirect": {
        "regexSubstitution": "\\1"
      }
    },
    "condition": {
      "regexFilter": "^https://www\\.google\\.com/url\\?.*&?q=([^&]+).*$",
      "resourceTypes": [
        "main_frame"
      ]
    }
  }
]

Example inputs and outputs:

ghostwords commented 1 year ago

This is not an issue with blocking webRequest because we can write imperative code that correctly decodes URL components. For example:

let url = new URL(details.url).searchParams.get('q');
return { redirectUrl: url };
ghostwords commented 1 year ago

As noted during today's call, while URL component encoding is the standard approach to preserving special characters in URL components, it's possible for URL redirectors (like https://www.google.com/url?q=SOMEURL) to use or to switch to a custom encoding (such as Base64, or something completely custom). A custom encoding will defeat all DNR-based URL cleaner/privacy extensions, until DNR is extended to support that particular encoding. This is an example of https://github.com/w3c/webextensions/issues/151#issuecomment-1018778881.

gorhill commented 1 year ago

it's possible for URL redirectors to use or to switch to a custom encoding

That defeatism argument is a recurring one and had been often used to rationalize not trying to defuse one way or another some mechanisms in the wild.

By that logic there is no point doing anything since everything can be worked around by websites. Of course we do not give up on the basis of that argument, and it works -- an approach does not have to be guaranteed to work everywhere to be useful, it has to work for enough cases.

If anything, it further shows how webRequest is useful, whereas one party may not be motivated in extending capabilities solely based on this defeatism argument, another one will be motivated and it may very well turn out that the end result is beneficial to end users. Now with MV3 the parties which are motivated to address the issues lose the ability to be proactive about it.

ghostwords commented 1 year ago

@gorhill, I tried to point out that this is an example of DNR inherently disadvantaging privacy and security extensions. (With webRequest, any extension can update itself quickly to respond to a change in how a redirector works. With DNR, extensions will have to first convince each browser vendor to update the DNR API.)

I did not mean to suggest there is no point in addressing the most common scenario, namely extracting, URL-decoding and redirecting to some portion of a given URL.

gorhill commented 1 year ago

Sorry, I didn't have you in mind when I posted my comment, and I actually didn't even see the discussion about this and made assumptions about how it went just from your comment, I should have waited to find out if my comment applied. I will go read the discussion.


So yeah, after reading the discussion about it, there was no point for my comment, sorry. Next time I will be more carefully to avoid pointless noise.

Cimbali commented 1 year ago

As maintainer of CleanLinks I actually am in this use case. Many other extensions (list dated 2018) rely on the same mechanisms.

I must say real-life use cases are often more complicated and can use several nested redirections, “customised” url-encoded or base64, improperly encoded URLs, embedded URLs in the path instead of the query parameters, in the hash of the URL, etc.


Some examples:


It seems unlikely to me that the functionality for this use case can be provided without executing a function returning the properly cleaned link.

If security is the main concern here, this function could be executed in a restricted context (for this use case at least). It probably needs some static inputs (the code to run, a set of rules), but could further be prevented from making requests or otherwise communicating with anything else than URL input/output after setup.

rdcronin commented 9 months ago

At a high level, I'm supportive of this use case. I think there will be potential challenges and subtlety in the API and implementation, but I think it's something worth looking into.

ghostwords commented 1 month ago

The Chromium bug for this issue: https://issues.chromium.org/issues/338071843