Open asankah opened 6 years ago
I just saw this comment on #1829:
Firefox, as far as I can tell, doesn't use a safelist for registerProtocolHandler; it uses a blacklist instead. So Firefox already supports all these schemes.
Is that true? That should make it much easier to switch over to a blacklist.
I'm quite in favour of doing this. I'd like to know what the security concerns are with opening up all possible schemes, other than a small set of "special" schemes.
I would imagine the blacklist would include just schemes that are treated specially by the browser (which we wouldn't want web applications to "steal" links to). This would roughly correspond to the special schemes defined in [URL], but not gopher which I don't believe any modern browser deals with. So:
Plus a few other important schemes like "javascript", "data", "blob", "filesystem", "about".
User agents should be allowed to add new schemes to the blacklist, for example, Chrome would block "chrome" (otherwise it would wreak havoc if a site took over all the "chrome:" URLs, which are Chrome's alias for "about").
But for any scheme that isn't special to web browsers, I don't see a security reason not to let sites (with the user's concent) handle those links.
In addition to special schemes, we should also blocklist any scheme containing a .
, to prevent schemes that look like a domain; see https://lists.w3.org/Archives/Public/public-whatwg-archive/2011Aug/0238.html for examples of using schemes named mail.google.com
or 192.168.1.1
I'd also suggest blocklisting "localhost" and "localdomain", and any scheme consisting entirely of numbers.
A blocklist with additional user-agent-specific entries seems rather bad. It also seems bad that it would seemingly prevent adding a new scheme like https
in the future.
As far as I can tell Firefox uses a safelist. (I cannot find a C++ implementation of that function, and indeed I find some hints that they call the JS implementation from the code backing their IDL.)
@annevk has enumerated a few issues with a blocklist, although none are insurmountable. To summarize my thoughts:
It's possible that browsers may want to add native support for more schemes in the future. It seems unlikely we'll take on the pain of another http -> https transition any time soon, but browsers may want to support other protocols natively for other features (e.g. the various distributed schemes from #3080, if the distributed web folks succeed in making their project mainstream).
Indeed, some browsers (e.g. Beaker) do support distributed web schemes natively. I'm not sure what document.URL
or location.href
return in such cases---do they return some underlying HTTPS resource, as would be the case with registerProtocolHandler, or do they return the original URL? If the latter, then we already have an instance of such "future expansion" conflicts.
But maybe such conflicts are OK! In such instances, a browser will change the meaning of an existing scheme in the wild. That's true whether we allow it in registerProtocolHandler
or not, due to mechanisms like how native apps can intercept URL schemes on many OSes (last I checked), or how URL schemes might have meaning outside the browser context. For example even though svn:
is not available to registerProtocolHandler, browsers probably wouldn't be excited about adding it in some way that conflicts with the existing definition.
Basically I'm saying that adding a new scheme already requires a careful survey of the ecosystem, and registerProtocolHandler doesn't change that too much.
This is not great, but it seems like an area where we may want to accept non-interop for the sake of flexibility. Again browsers would be effectively compat-constrained in how they extend the blocklist.
Alternately we could try taking a union of all existing schemes in internal use.
It's worth pointing out that although registerProtocolHandler affects navigation via <a>
links, usually navigation via <a>
links cannot reach user-agent-internal schemes anyway. So it'd at least be possible, albeit confusing (and possibly unsafe), to allow registerProtocolHandler("chrome", ...)
, which would only change the meaning of <a href="chrome:flags">
from a no-op to a registered protocol handler.
Some of the blocklist vs. safelist discussion happened in https://github.com/whatwg/html/pull/1829#issuecomment-418662548. Let's try to use this thread going forward.
I'm reasonably convinced by @mgiuca's points there, plus my above musings. @annevk, how do you feel? @jonathanKingston, you were working on this area in Firefox recently---thoughts?
Yup. I'm also in favor of a blocklist.
I think neither an allowlist nor a blocklist absolves from the need to deal with revoking a scheme that was previously available for registration. In the case of experimental protocols, it's likely part of an explicit goal as stated elsewhere. Also provisional scheme registrations can help with discoverability of schemes that are reserved by various UAs or be in conflict with someone else's use case.
cc @mikewest @noncombatant FYI
Thanks @domenic for a detailed analysis.
It's worth pointing out that although registerProtocolHandler affects navigation via
<a>
links, usually navigation via<a>
links cannot reach user-agent-internal schemes anyway. So it'd at least be possible, albeit confusing (and possibly unsafe), to allowregisterProtocolHandler("chrome", ...)
, which would only change the meaning of<a href="chrome:flags">
from a no-op to a registered protocol handler.
This is true, but it would be pretty confusing if typing "chrome:version" into the address bar opened Chrome's internal version page, while clicking a "chrome:version" link opened a web page that had registered for the Chrome scheme. It's probably best to allow browsers to block this type of link, as Chrome currently does.
It's possible that browsers may want to add native support for more schemes in the future.
Yeah. We should assume that any scheme popular enough to warrant a website that handles all of its links could end up being something that browsers want to handle natively. This will be true regardless of whether we have a whitelist or blacklist. (Presumably the popular new protocol that the future browsers want to implement will be one of the ones whitelisted.)
So either way, we should plan for a scenario where some future scheme X is (whitelisted|not blacklisted) and a browser wants to support it natively.
But maybe such conflicts are OK!
I think so.
If a browser wants to natively handle a future scheme X that is (whitelisted|not blacklisted), it has three choices:
For "chrome" URLs, I think Chrome would choose Option 1. Same if a new protocol became so ubiquitous (like "https") that it's always considered the domain of the browser. But for most protocols, browsers would generally go with Option 3.
There isn't even any need to spec this. Because it relates to the browser UI (not in-page UI), it's automatically a user agent choice. Right now, I could make a browser that doesn't allow a site to register "mailto" because I have my own email client built into the browser. The spec says:
"User agents may, within the constraints described in this section, do whatever they like when the method is called. A UA could, for instance, prompt the user and offer the user the opportunity to add the site to a shortlist of handlers, or make the handlers their default, or cancel the request. UAs could provide such a UI through modal UI or through a non-modal transient notification interface. UAs could also simply silently collect the information, providing it only when relevant to the user."
So browsers are not required to do anything when a site registers a protocol handler.
Furthermore, browsers already have UI to deal with conflicts between website-registered protocol handlers, and so-called "external" protocol handlers registered at the OS level. The issue of a conflict between a future website-registered handler and an in-browser handler is no more complex than what we already deal with.
In light of this, I don't see any future-proofing issues from switching to a blocklist.
In such instances, a browser will change the meaning of an existing scheme in the wild. That's true whether we allow it in
registerProtocolHandler
or not, due to mechanisms like how native apps can intercept URL schemes on many OSes (last I checked), or how URL schemes might have meaning outside the browser context.
That's a slightly different argument, because you're talking about how a browser might override an OS-registered handler with its own internal implementation. Which is true, but it's nothing to do with web standards since neither side of that is being registered by this API.
I think it's more relevant to consider the clash between website-registered and OS-registered handlers as a precedent.
I don't have strong opinions about registering protocol handlers and I'm happy to defer to y'all on the underlying question of allow vs. block, except insofar as registration must be under user control, and they must exclude the kinds of schemes that the browser does interesting things with (because I'm sure we make all kinds of assumptions in various bits and pieces of our codebase that assumes we control these kinds of URLs completely).
Skimming through Chrome's codebase, that list might include things like:
chrome
and chrome
-prefixed schemes (e.g. chrome-devtools
, chrome-error
, chrome-extension
, chrome-extension-resource
, chrome-guest
, chrome-resource
, chrome-native
, chrome-search
, and probably others I'm missing)cros
android-app
, content
and cid
view-source
I doubt that's an exhaustive list.
- ftp
I'd be much less upset about allowing sites to take over ftp
, actually, as I'd dearly love to remove support for it. :)
they must exclude the kinds of schemes that the browser does interesting things with
I don't think this needs to be in the spec. As I said above, user agents can ignore any registration they want, so I think it's up to each user agent to block the things they care about (though this can be mentioned as a security consideration in the spec).
+1 to allowing sites to act as FTP clients and removing FTP support (eventually). (We'll need some kind of sockets API first!)
As I said above, user agents can ignore any registration they want, so I think it's up to each user agent to block the things they care about
I don't think this is really true. This is true if you have the market share of Chrome, but it's much less true if you have the market share of say, Servo. Servo could end up having to reuse Chrome's blocklisted schemes for its purposes and be forced to reverse engineer those.
I'd much rather enumerate those schemes so it's clearer what ends up being reliable across user agents.
What's the current of this issue? Skimming the discussion, it sounds like most of you are in favor of a blocklist instead of a safelist. Is it just a matter of now determining what that blocklist should contain?
It would be good to have someone from Mozilla chime in on whether a blocklist is acceptable to them, as the other implementer of registerProtocolHandler.
I haven't seen mention of the IANA registry at https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml explicitly referenced by the URL spec at https://url.spec.whatwg.org/#url-scheme-string. Is there a reason not to stick to a safelist and defer to that registry? (And the "web+WHATEVER" escape-hatch seems like it would allow for experimentation prior to gaining a registration.)
One wrinkle from a Mozilla perspective right now is that the very experimental https://github.com/mozilla/libdweb project is providing a means for WebExtensions to register and show the custom scheme in the address bar (without involving a redirect). While I think there are various discussions that need to happen with the Security UX people before anything like that would ship, such an approach means spoofing concerns do become an issue and a safelist becomes favorable. Especially when the safelist comes with metadata that allows us to more easily provide human readable explanations of what permission the WebExtension is asking for.
If we intend for the list to be an actual security mechanism, then only a safelist can act as such. Anything less, such as a blocklist or lexical sniff-test (e.g. the idea above that we should disallow .
in scheme names), can only be a hygiene or basic sanity test.
Maybe that's OK; maybe we are putting all our security eggs in the "the user selects whether or not to register the protocol at runtime" basket.
I see at least 3 classes of potential security concern:
From a pure security perspective, the combination of a safelist + the user choice UX provides defense in depth and would be the strongest choice among the current options.
I do think we owe web developers, and new protocol developers, clarity. Any safelist, blocklist, or lexical sniff-test should be standardized, rather than being UA-defined.
Using IANA's list doesn't seem to cover many of the schemes that were requested in the original intent to implement.
In fact, I cross-referenced the two lists. The ONLY schemes common to both lists are cvs
, git
and svn
. Literally every other scheme on the intent to implement is not available in IANA, including all the distributed web protocols, the revision control systems bzr
, hg
and darcs
and all of the +
variants of those schemes, the mapping ones, and doi
.
Maybe these can be added to IANA, but there seems to be a disconnect between what schemes people want to use, and what have been added to IANA.
Furthermore, if we used IANA as a safelist, we'd have to have a blocklist as a subset of that, because we don't want sites taking https
, etc.
I also don't see any security risk of having an open list, that is mitigated by using IANA's list. As I said above, all of the useful protocols are going to have to be safelisted. The security risks aren't found in protocols that nobody is using. They're found in the protocols that will have to be safelisted.
For example:
Giving example.com the power to handle ftp: URLs might be surprisingly powerful, in ways that are not immediately obvious (to any of us).
I think granting handling of ftp
scheme is an explicit goal of this effort (e.g., we could eventually remove FTP client logic from browsers and defer to web apps). Maybe not right away, but I don't see why ftp
should be protected while mailto
is not. (Is it that you might download a file from a legitimate FTP site, but you've installed a dodgy FTP client that MITMs your download? Yes, this can happen but users should think of registering an FTP client similar to setting their default browser; you trust the client.)
What if yolo: becomes super popular, and people often select https://yolo.example.com/%s to handle yolo: URLs, and other sites start loading resources with yolo:, and then there's a protocol security vulnerability in the yolo: protocol, or yolo.example.com gets compromised, or whatever.
The problem is that if yolo
becomes super popular, it would presumably be added to IANA's list, and thus the safelist won't protect users.
Any safelist, blocklist, or lexical sniff-test should be standardized, rather than being UA-defined.
This can't really be independent of the UA. Different UAs will have different special schemes. For example, chrome
is special in Google Chrome. (chrome
is on the IANA list btw.) Google Chrome would presumably want to block the scheme chrome
, but there'd be no reason to standardize the blocking of chrome
in other browsers.
no reason to standardize the blocking of chrome in other browsers
As I said before, I don't think that's true. If Chrome lost a lot of market share, chrome
somehow became a popular scheme by sites targeted primarily at users of the dominant browser, Chrome might be forced to also allow chrome
to be registered (and figure out something else for internal usage).
I think we would simply say "Sorry, you can't register chrome://
URLs in this browser, because it conflicts with our internal URL scheme." And this wouldn't break sites, it would just mean they can't register that particular URL.
The alternative is that we explicitly blocklist schemes used by popular web browsers, which raises issues like: what's popular enough to be on the list. Can a web browser decide to register git://
for internal pages, then have git
added to the blocklist because it's used by that browser?
It would effectively break sites if the feature had enough critical mass.
Depends on the definition of "break". If Chrome suddenly decided that mailto
was a reserved protocol and refused to register it (which would be kinda dumb, but let's say we did it), suddenly Gmail and other web mail clients that register mailto
links would no longer be able to register for that protocol, or handle those links. But I wouldn't consider the site "broken". A user-agent-specific piece of UI has stopped working.
Similarly, Chrome could entirely remove the web protocol handlers feature or make it extremely hard for users to register (since the spec says: "UAs could also simply silently collect the information, providing it only when relevant to the user.") The fact that practically no Chrome user can make use of web protocol handlers any more would not cause the protocol handling sites to "break". Chrome would still be adhering to the spec.
@mgiuca - if the feature has critical mass, it means, for examples, the many websites (not to mention native applications, though out of scope here) may have <a href="custom-popular-protocol:...">
or the like and so they would indeed break (especially since you cannot tell whether the protocol is registered).
It is not the registering side that breaks, it is the using side that breaks.
My personal opinion is that a blocklist is attractive in the long term because it would make much easier to quickly extend the whitelist. I'm not really qualified to say whether a safelist brings more security, but I acknowledge that we'd have to deal with the case where whitelisted schemes are later implemented natively anyway. However, I also share Anne's concern about UA-specific blocklist entries ; I think these should really be in the spec if we switch to a blocklist (incidentally, note that blocklisting prefixes like "chrome", "android", "about" seem to cover many of the UA-specific schemes, and would allow future extensions without spec changes).
Checking current implementers' feedback, I see that essentially most people who commented in favor of a blocklist are from Google/Chromium. I checked with Mozilla/Firefox's and their current preference is still a whitelist. There was opposition to registerProtocolHandler in webkit-dev's 2015 thread (noticeable comments from Apple are https://lists.webkit.org/pipermail/webkit-dev/2015-May/027457.html and https://lists.webkit.org/pipermail/webkit-dev/2015-July/027518.html) and checking recently on WebKit's slack, it seems preference would still be a whitelist if the feature is implemented.
Last but not least, one of the initial argument in favor of a blacklist is that it would avoid delays to ship new whitelisted schemes in browsers. However previous requests have been blocked for 1-4 years on this discussion blacklist VS whitelist! Here are the ones I found:
Igalia plans to try and whitelist the "decentralization" schemes (and it's not a bigger effort to add more of course) and we will be happy to send code patches, WPT tests and handle necessary intent-to processes to Mozilla and Chromium projects. Note that Mozilla already whitelisted these schemes in WebExtensions' protocol_handlers (bugs 1428446) so it's currently out-of-sync with Navigator.registerProtocolHandler().
My conclusion is that several user requests have been pending for a long time on this while the statu quo is that 2/3 web engines would lean toward a whitelist. It does not seem that a consensus can be reached any time soon, so I'd prefer to try to extend the whitelist in the short term. I'll follow-up on the blink-dev thread too.
What's the current state of this? What would it take to move this forward, if someone wants to help with that?
Mostly passing along some of the comments from this Blink Intent-to-implement.
The use of a safelist presents a challenge to someone introducing a new protocol who wishes to integrates it into the web platform via
registerProtocolHandler
. They'd need to file a request and, assuming browser vendors react immediately, face around a 3 month lead time until stable browsers start supporting the new scheme. A blocklist removes this delay and also makes new schemes backwards compatible with existing browsers that use a blocklist.On the other hand, the use of a safelist allows vetting of a known set of schemes rather than evaluate the domain of potential names and block harmful ones like existing well known schemes or attempts at typojacking.
Let's revisit this and see which one works better for the web platform.
cc @annevk, @domenic, @mgiuca