w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
596 stars 56 forks source link

Extension API to find the public suffix (eTLD) of a given URL/domain #231

Open Rob--W opened 2 years ago

Rob--W commented 2 years ago

The public suffix list is a database of effective top-level domains (eTLD), which are the public suffix of URLs. This database is included in browsers (at least by Firefox, Chrome and Safari - sources below) and may be updated remotely. There have been feature requests for an API that allows extensions to identify the public suffix (eTLD) of a given URL:

There are solve known problems with the application of public suffix list (https://github.com/sleevi/psl-problems), but that does not necessarily rule out an extension API with such access. Extensions that need to rely on the public suffix list currently need to rely on alternatives, such as bundling the database with the extension, at the risk of having incompatible interpretations of the "public suffix of a URL" between the browser and the extension. With proper documentation of the problems associated with the public suffix list, extension authors can make a conscious decision to use the API when they need to.

This issue is to track use cases and the desired shape of the API. For example, the following would be the minimum:

let suffix = await browser.publicsuffix.getPublicSuffix("www.example.co.uk");
// Result: co.uk

Here are other examples of APIs to query the public suffix:

oliverdunk commented 2 years ago

Similar proposal: https://github.com/w3c/webextensions/issues/58

xeenon commented 2 years ago

I would be in favor of this for Safari.

gijsk commented 2 years ago

In favour of this proposal, in addition to the consistency issue that was pointed out in the meeting (ie if the extension and browser have a different version of the PSL, there's a potential for security issues), some other arguments for having this exposed as an API:

rdcronin commented 7 months ago

I'd be favor of this for Chrome.

dotproto commented 7 months ago

To proceed with this issue, we need a more concrete proposal. Some points that a more fleshed out proposal would address include:

zombie commented 7 months ago

I'll follow up to see if Mozilla's multi-account-containers maintainers want to put up a proposal for the api shape here.

Rob--W commented 7 months ago

@oliverdunk is going to reach out to PSL maintainers to inform them of our intent to offer this API.

Dreamsorcerer commented 6 months ago

some other arguments for having this exposed as an API:

Just to reiterate the points from my original request (#58) which covers many of the same arguments:

Issues with the current approach include:

This issue is to track use cases and the desired shape of the API.

As per the title of my original request, I think the most common scenario is to get the organisational domain, rather than the suffix alone. So, to save some manual string manipulation, it would be great for the API to include a function to get the organisational domain.

oliverdunk commented 6 months ago

I have started an email thread with the maintainers of the PSL - will keep this thread updated.

simon-friedberger commented 6 months ago

Such a proposal should contain some guidance on what to do when the result changes, or at least a warning. This is a rare event so it is prone to getting overlooked.

oliverdunk commented 4 months ago

As mentioned in a recent meeting, I met with Simon Friedberger (Mozilla) and Simone Carletti, both PSL maintainers. They were generally very supportive and would like to see this API. We agreed introducing an extension API is unlikely to generate a significant number of additions to the list, since developers are already using the PSL in other ways today, but that while volume is not a concern any education to maintain submission quality would be appreciated.

We also discussed several practical thoughts on the API signature / functionality:

erosman commented 4 months ago

Just as an idea for the API ....

Last year, I wrote a pure JavaScript PSL (Public Suffix List) module.

I looked at similar available methods in order to base the properties on. The module outs 4 values of subdomain, domain, sld, & tld.

Dreamsorcerer commented 4 months ago

Keeping the list up to date in browsers is important.

I would assume this part is already being done (maybe not in all browsers though..)? e.g. Firefox shows passwords on mail.google.com that were created at calendar.google.com or similar. I assume they must be using the PSL for such functionality.

Last year, I wrote a pure JavaScript PSL (Public Suffix List) module.

I don't think that code is correct (it doesn't appear to handle ! or * rules). There are several other examples online too. Here's one I've done based on an existing solution, which in theory, should be a lot more optimised for performance: https://github.com/AiondaDotCom/trashmail-addon/blob/master/publicsuffixlist.js But, does require preprocessing the list to a more optimal format for querying first: https://github.com/AiondaDotCom/trashmail-addon/blob/master/update_suffixes.py (Result: https://github.com/AiondaDotCom/trashmail-addon/blob/master/public_suffix.json)

oliverdunk commented 4 months ago

I would assume this part is already being done (maybe not in all browsers though..)? e.g. Firefox shows passwords on mail.google.com that were created at calendar.google.com or similar. I assume they must be using the PSL for such functionality.

All browsers include the PSL (it is required for things like cookie handling), but updates aren't necessarily as frequent as would be ideal. I can only speak for Chrome where I understand it is currently a manual process we run every ~6 months.

gijsk commented 4 months ago

On the Firefox side, each build ships with a copy that is up-to-date at time of build, I believe. The update process for the source code is automated, cf. the commit log for the data file: https://hg.mozilla.org/mozilla-central/log/tip/netwerk/dns/effective_tld_names.dat . There was some attempt in the past to be able to update out-of-release-band (like safebrowsing and other similar services that update more frequently than the standard release cadence) but I think that stalled once we hit issues with how this changed origin parsing/serialization (and doing so while a multi-process browser is running while keeping all the processes aligned on that change is... not trivial). Cf. https://twitter.com/ValentinGosu/status/1510295473864728581

Dreamsorcerer commented 4 months ago

From my side maintaining a list in an extension, I'm satisfied if I remember to update once a year, so even a 6 monthly update seems good to me and a big improvement.

dannycolin commented 4 months ago

From my side maintaining a list in an extension, I'm satisfied if I remember to update once a year, so even a 6 monthly update seems good to me and a big improvement.

It also saves you from pushing it to all the addons' stores. Granted that most of the time it's a minor change that gets quickly reviewed and accepted. However, there's always a risk things takes more time or something else.

With a builtin API, we're just giving more peace of mind to the addon developers :).

Rob--W commented 2 months ago

FYI I asked the contributor who submitted a patch to Firefox before whether they're interested in creating a proposal according to our proposal process: https://bugzilla.mozilla.org/show_bug.cgi?id=1315558#c27