mozilla / standards-positions

https://mozilla.github.io/standards-positions/
Mozilla Public License 2.0
635 stars 69 forks source link

Downgrade User Agent Client Hints to 'harmful' #552

Closed hsivonen closed 3 years ago

hsivonen commented 3 years ago

Request for Mozilla Position on an Emerging Web Specification

Current Status

Our current position is non-harmful.

Why change?

Upon inspection, the various features of User Agent Client Hints fall into three categories:

  1. Information already exposed in the User-Agent header in a way that, realistically, isn't going to go away (example: the Mobile; token),
  2. information that's harmful to expose
  3. information that's harmful to expose at the time of the HTTP request but the legitimate purposes could be achieved even if the information became available to the site after the fact.

Moving stuff around (from User-Agent to Sec-CH-UA-*) doesn't really solve much. That is, having to request this information before getting it doesn't help if sites routinely request all of it.

What Chrome Does

For reference, these are the Sec-CH-UA- headers that Browserleaks got out of Chrome 88 (non-Sec-CH-UA- Client Hints not included below but sent by Chrome: Viewport-Width, DPR, Device-Memory, RTT, Downlink, ECT). In Chrome, these appear to be enabled on Android and Chrome OS and behind a flag on Linux, Windows, and Mac.

Chrome 91 seems to have reduced the headers only to Sec-CH-UA and Sec-CH-UA-Mobile, however. Still, some of the comments below are based on what Chrome 88 exposed (if flag enabled).

On x86_64 Linux

Sec-CH-UA "Chromium";v="88", "Google Chrome";v="88", ";Not A Brand";v="99" Sec-CH-UA-Full-Version "88.0.4324.150" Sec-CH-UA-Platform "Linux" Sec-CH-UA-Platform-Version "" Sec-CH-UA-Arch "x86" Sec-CH-UA-Model "" Sec-CH-UA-Mobile ?0

On x86_64 Windows

Sec-CH-UA "Chromium";v="88", "Google Chrome";v="88", ";Not A Brand";v="99" Sec-CH-UA-Full-Version "88.0.4324.150" Sec-CH-UA-Platform "Windows" Sec-CH-UA-Platform-Version "10.0" Sec-CH-UA-Arch "x86" Sec-CH-UA-Model "" Sec-CH-UA-Mobile ?0

On aarch64 macOS

Sec-CH-UA "Chromium";v="88", "Google Chrome";v="88", ";Not A Brand";v="99" Sec-CH-UA-Full-Version "88.0.4324.150" Sec-CH-UA-Platform "Mac OS X" Sec-CH-UA-Platform-Version "11_2_1" Sec-CH-UA-Arch "arm" Sec-CH-UA-Model "" Sec-CH-UA-Mobile ?0

On aarch64 Android

Sec-CH-UA "Chromium";v="88", "Google Chrome";v="88", ";Not A Brand";v="99" Sec-CH-UA-Full-Version "88.0.4324.152" Sec-CH-UA-Platform "Android" Sec-CH-UA-Platform-Version "10" Sec-CH-UA-Arch "" Sec-CH-UA-Model "Nokia 9" Sec-CH-UA-Mobile ?1

On x86_64 Chrome OS

Sec-CH-UA "Chromium";v="88", "Google Chrome";v="88", ";Not A Brand";v="99" Sec-CH-UA-Full-Version "88.0.4324.153" Sec-CH-UA-Platform “Chrome OS" Sec-CH-UA-Platform-Version "13597.84.0" Sec-CH-UA-Arch "x86" Sec-CH-UA-Model "" Sec-CH-UA-Mobile ?0

Observations

Use Cases

(Quotes from this README.)

Based on browser features

This use-case enables services like polyfill.io to serve custom-tailored polyfills to their users, without bloating up the experience of modern browser users. Similarly, when serving Javascript to users, one can avoid transpilation (which can result in bloat and inefficient code) for browsers that support the latest ES features that were used. Finally, when serving images, some browsers don't update their Accept request headers, while in other cases (cough WebP cough) the MIME type is not descriptive enough to distinguish between different variants of the same format. In those cases, knowing the browser and its version can be critical to serving the right image variant.

For that use case to work, the server needs to be aware of the browser and its meaningful version, and map that to a list of available features. That enables it to know which polyfill or code variant to serve.

Services that wish to do that using UA-CH will need to inspect the Sec-CH-UA header, that is sent by default on every request, and modify their response based on that.

We have a couple of decades of experience of this being an anti-pattern compared to browsers making the features detectable and the sites detecting the features instead of inferring them from the browser version, because if site assumes that browser A has a feature, browser B has to pretend to be browser A in order to make the site use the feature in browser B as well.

If Web devs want to detect WebP in Safari, the requested change to Safari should be making WebP support itself detectable instead of exposing the OS version.

Browser bug workaround

Some browser versions have well-known bugs which require content to workaround them. Triggering those bugs can result in browser crashes, content breakage and other issues, and those bugs are by definition not something that can be feature detected. Therefore, content needs to avoid them altogether for affected browser versions.

For that use case, servers need to be aware of the browser and its meaningful version, be aware of browser bugs that impact them, and apply workarounds if the current browser version is impacted.

Services that wish to do that using UA-CH will need to inspect the Sec-CH-UA header, sent by default on every request, and use it to modify their response.

This one indeed is something that browsers can’t offer designed detection surface. Still, this generally needs the major version of the engine. Not “brand”, minor OS version, or such.

An interesting question is how relevant this actually is with current version uptakes for browsers and sites having resources to pay attention to users who for whatever reason aren’t updating according to the rapid release schedule. At the time sites deploy a workaround, they can’t necessarily know what future browser version won’t have the need for the workaround. Can we guarantee only retrospective use? Do Web developers care enough about retrospective workarounds for evergreen browsers?

Marketshare Analytics

A browser's market share can be extremely important. Having visibility into a browser's usage can encourage developers to test in that particular browser, ensuring fewer compatibility issues for its users. On top of that, a browser's market share can have a direct impact on the browser vendors' business goals, ensuring future development of the browser.

For marketshare analytics to work, the server needs to be aware of the server and its meaningful version, in order to be able to register them and find their relative market shares.

Sites that wish to provide market share analytics using UA-CH will need to inspect the Sec-CH-UA header, that is sent by default on every request, and keep a record of it.

This doesn’t require the information to be available for discriminatory decision making at the time of the HTTP request. The use case would still be addressed if the site learned the information after the fact (e.g. by being able to decrypt it only later).

Safari’s ad click attribution feature is precedent for letting sites gather statistics in a deferred way.

Browser based adaptation

Some sites choose to serve slightly different content to different browsers. The reason for that vary. Some reasons are legitimate (e.g. wanting to serve different experiences to different browsers due to their feature support). Other reasons are slightly less legitimate (e.g. warning users that the site's developers hasn't tested in their browser). And then there are reasons which are outright wrong (e.g. Willingness to block certain browsers' users from accessing the site).

As browsers, we want to enable the former, while discouraging the latter.

It is in the interest of minority browser engines and even “brands” using Chromium to treat different experiences based on browser identity rather than feature detected capability as an anti-feature.

Mobile specific site

Many site owners serve different content between mobile and desktop sites. While responsive web design has made it possible to serve multiple form factors using a single code base, there are still cases where serving a mobile-specific version can be better adapted.

For those cases, serving a mobile specific sites to users on mobile devices can be helpful. For that to work, the server needs to be aware, at HTML serving time, whether the user is on a mobile device or not.

Sites that wish to serve mobile-specific sites using UA-CH can do that using the Sec-CH-UA-Mobile headers that are sent by default on every request.

Sec-CH-UA-Mobile as one bit of information does not seem harmful enough to oppose especially when mobile browsers have a “request desktop site” piece of UI available. However, this bit of information is already easy to extract from the old User-Agent string, which is realistically not going away even if frozen.

Low-powered devices

Some sites serve different content to low powered devices that cannot deal with CPU intensive tasks, large video and images, etc. Such content adaptation typically uses the device model information that's integrated in the current User-Agent string for that purpose, relying on server-side databased to convert device models into memory, CPU power, and other categories on which they want to split their content.

If the dimension on which the split is made is memory, the Device-Memory Client Hint can be used to make that distinction. Otherwise, with UA-CH, sites can still retrieve the device model by opting in to the Sec-CH-UA-Model hint.

Both of these hints are not sent by default, so require some extra work. Top-level origins will need to send Accept-CH: Device-Memory, Model headers with their responses to opt-in to receiving those hints. In case where they absolutly need to perform that adaptation on every navigation request, a redirect would be required here in case where the hints are not present in a browser that supports them. There's ongoing work to eliminate that extra step.

Third-party origins that need to perform such adaptation would need delegation from the top-level origin. The top-level origin would need to opt-in using Accept-CH, as well as add Feature-Policy headers that delegate those hints to the third-party origin.

Facebook Year Class is cited as an example of this usage. It’s unclear what exactly Facebook varies based on this information, how common this practice is beyond Facebook or if Facebook does this at present when viewed in Chrome as opposed to doing this is the Android app.

Sec-CH-UA-Model provides a lot of identifying bits on Android and leads to having to use an iPhone in order to be part of a larger anonymity set. This is a reason to oppose to Sec-CH-UA-Model. Reporting e.g. device memory as coarse bands is less harmful and addresses the use case. CPU performance is harder to classify, but should be possible to classify coarsely.

OS specific styles

Some sites may wish to tailor their interfaces to match the user's OS. While progressive enhancement is likely to be a better path here (e.g. through the application of different button styles using script), there may be cases where folks would wish to deliver tailored inline styles based on the platform and platform version.

Those cases are very similar to the case discussed above (in "Low-powered devices"), only with the Sec-CH-UA-Platform and Sec-CH-UA-Platform-Version hints.

The framework cited here varies between iOS and Android. There is probably enough fingerprinting surface to detect the Windows vs. macOS vs. non-Mac desktop nix vs. Android vs. iOS anyway, so it’s not worthwhile to hide that. However, non-Linux non-Mac desktop nix systems will probably benefit both in terms of site compatibility and in terms of privacy by claiming to be “Linux” even if they are e.g. a flavor of BSD.

The notion of a framework varying styling based on OS version seems niche enough not to justify the exposure of the OS version.

Also, you can’t infer a theme from “Linux” regardless of version. Granularity like “Ubuntu” or “Fedora” would be bad for privacy and also it seems implausible that sites would pursue the diminishing returns of trying to match distro themes. (Chrome says “Linux” when running on Ubuntu.) Notably, Android isn’t guaranteed to be themed according to pure Android, either.

OS integration

Similarly, some sites would want to change links to OS specific ones (e.g. Android intent links). While, again, progressive enhancement can be used to modify those links using script, rather than bake them into the HTML, some sites may prefer server-side adaptation.

Again, like the "OS specific styles" case, they'd need to use the platform and platform version hints to do so.

Intent support no longer requires the OS version, since sites can ignore very old OS versions. However, if a new feature of this nature is introduced, it should be introduced as feature detectable instead of being inferred from OS version.

Browser and OS specific experiments

Some servers may like to limit their multi variant experimentation to specific browsers, specific platforms or specific versions of any of the above. For experiments that are limited to browser and version, those sites can use the Sec-CH-UA values sent by default on requests. If they require platform and its version, they'd have to opt-in for those hints, or use client-side scripts to control the experimentation.

Enabling experiments like these carries risk to minority engines and Chromium “brands” with little upside from addressing this use case.

User login notification

Many sites, especially security sensitive ones, like to notify their users when log-in from a new device happens. That enables users to be aware of those logins, and take action in case it's not a login that's done by them or on their behalf.

For those notifications to be meaningful, sites need to recognize and communicate the commercial brand of the browser to the user. These messages often also include the platform and its version in order to make sure the user knows which device is in question.

Since such messaging doesn't require any server-side adaptation, it's better for this case to use the userAgentData.getHighEntropyData() method in order to retrieve the required information.

Providing this information as part of login notification is useful if accurate, but there are other interests that go against providing this information, since we can’t limit it to this application. Notably, this already causes confusion with Chromium “brands” that have to claim to be Chrome for compatibility purposes but then the login notifications say Chrome instead of e.g. Edge.

Download of appropriate binary executables

Some sites are used to download binary executables of native applications, and need to be able to propose the right binary to the user by default. The right binary executable for the current user depends on a few factors: their operating system, its version, as well as their CPU architecture.

In order to tackle that use case, download sites can opt-in to receive the Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version and Sec-CH-UA-Architecture hints (or query them through the API), in order to ensure the right binary is offered to the user by default.

Universal Binaries make knowing “macOS” the sufficient level of detail for Mac. Windows apps typically use an installer anyway. Making a 32-bit x86 installer do the x86 vs. x86_64 vs. aarch64 detection solves this for Windows.

Notably, Sec-CH-UA-Arch sent by Chrome doesn’t distinguish between x86 and x86_64 on Windows and Linux where this is relevant, which makes the header not really useful for the stated use case and just provides fingerprinting bits.

Apps whose old versions for old systems are kept available for download for old operating systems generally work well enough by listing them all and making the user pick.

Conversion modeling

Some machine learning models use various details from the User-Agent string in order to estimate various things about users of those user agents. Similar modeling would still be possible, but will require explicit opt-in to collect the required bits of information.

As with market share statistics, the visibility of this information could be deferred without harming the use case.

Vulnerability filtering

In some environments, proxy servers may be used to verify that the different users accessing information are not doing so from obsolete devices, that are potentialy vulnerable to security issues. While the browser and version information available from Sec-CH-UA can provide some information, the browser and OS full version are often useful for that kind of analysis. Such proxies would have to add a redirect step that opts-in to getting the browser full version and the platform version in order to continue to get access to those hints.

This may look like a good use case on surface, but from the user perspective what this really means is potentially being denied access because of UA information. Consider this WebKit bug.

This use case is also terrible for minority browsers trying to break into the market or trying to stay on the market. Gnome Web is not big enough for Google to care about it, and Google’s attempts to restrict logins only to latest major browsers risks shutting Gnome Web out of the most popular services on the Web.

Even if we believed that it’s a bad idea security-wise to use browsers that fork Firefox or Chromium and don’t properly keep taking security-relevant upstream changes, we should still resist accepting this use case as one we’d support, since it is directly against user choice.

Logs and debugging

Many services log the User-Agent string today and can use it in various ways when analyzing past traffic or when trying to debug errors related to their service. Those services will have to use the lower entropy values available through Sec-CH-UA for logging purposes, or opt-in to receive higher-entropy hints. The latter doesn't seem like something services should do just for forensic purposes. On the other hand, when specific issues are encountered, it may make sense for those services to opt-in to receive more details on the user agent, or use the userAgentData.getHighEntropyData() API for that purpose.

This is nice to have, but concern related to privacy and Web compat risk should take precedence.

Fingerprinting

User fingerprinting is the practice of gathering multiple bits of user information from multiple sources and intersecting them together to create a unique signature of the user, that would enable to recognize them later on, even if they clear state from their browsers (e.g. by deleting cookies).

For those cases, the origin needs to gather as much entropy as possible, so is likely to collect all the hints.

Spam filtering and bot detection

This is a case of fingerprinting that is not user-hostile, and therefore one we would like to preserve. With UA-CH this will be initially enabled by active collection of the various hints. We hope that alternative methods or APIs will exist to address the spam filtering and bot detection use cases in the future, as browsers may decide to intervene on behalf of their users by limiting the collection of user-identifying entropy (e.g., the Privacy Budget proposal).

Persistent user tracking

This is a case of fingerprinting that this proposal explicitly tries to make harder. Like the case of "spam filtering", it would still be feasible to actively collect all the hints about the user as bits of entropy. Unlike the above case, this is something that proposals such as the Privacy Budget aim to prevent, without providing any alternative mechanisms for persistent user tracking.

It should be clear that we aren’t treating fingerprinting as a legitimate use case even if it has spam filtering and bot detection applications.

Blocking known bots and crawlers

Currently, the User-Agent string is often used as a brute-force way to block known bots and crawlers. There's a concern that moving "normal" traffic to expose less entropy by default will also make it easier for bots to hide in the crowd. While there's some truth to that, that's not enough reason for making the crowd be more personally identifiable.

Similar to the spam filtering case, there's hope that alternative methods would be able to replace User-Agent string matching for this use-case.

Well-behaved bots that intentionally make themselves blockable by Web sites may continue to use the existing User-Agent header to identify themselves. There doesn’t seem to be value in trying to make them do something different now.

Comments on the Fields

Sec-CH-UA

Harmful.

Since sites tend to look at the most popular “brand”, this field has all the potential of developing the same problems as User-Agent presently. If sites start to opt into receiving this field as a matter of routine, we haven’t really improved things.

On the bright side, this field is versatile enough to put something like "Chromium";v="88", "Google Chrome";v="88", "Firefox";v="86", ";Not A Brand";v="99" in there.

In general, this seems just moving the old problem to a new place.

Sec-CH-UA-Full-Version

Harmful.

It seems that more often than not the third and fourth component of a value like "88.0.4324.150" will only serve fingerprinting purposes, and the first component is the realistic level of granularity that sites care about for bug workarounds. Also, for this to be useful for workarounds, the workaround needs to be retrospective: I.e. it stops applying when the version number gets high enough. Is deploying retrospective workarounds worthwhile with evergreen browsers? OTOH, if the workaround is for current and future versions of the browser, what’s this field used for?

This field leaves no room for fooling sites that only look for Chrome version. That is, there’s no room to show a fake Chrome version and the real Firefox version if the two diverge.

Sec-CH-UA-Platform

Not harmful, but redundant with a part of the old User-Agent string that’s not going away.

Allows for adapting to platform conventions, and the bits of entropy are probably discoverable anyway.

FreeBSD may want to say Linux and iPadOS may want to say Mac OS X, though.

Sec-CH-UA-Platform-Version

Harmful.

The realistic main use cases not served better by making OS version-dependent Web-exposed feature feature-detectable seem to be denying access in the form of vulnerability filtering and fingerprinting.

Sec-CH-UA-Arch

Harmful.

Adds a fingerprinting bit without actually addressing the use case of offering the right downloads due to not distinguishing between x86 and x86_64.

Sec-CH-UA-Model

Particularly harmful.

Lots of fingerprinting bits with dubious user-facing benefit.

Sec-CH-UA-Mobile

Not harmful, but redundant with a part of the old User-Agent string that’s not going away.

Makes one bit of information explicitly accessible, which is better than inferring it from other information.

However, it’s unlikely that we’d remove this information from the User-Agent header from which this is easier to extract than most other information there, so the value of this field as a new header is questionable.

miketaylr commented 3 years ago

Hi @hsivonen,

I thought I would correct a few mistakes in your post, in case it's useful for anyone else.

For reference, these are the Sec-CH-UA- headers that Browserleaks got out of Chrome 88 (non-Sec-CH-UA- Client Hints not included below but sent by Chrome: Viewport-Width, DPR, Device-Memory, RTT, Downlink, ECT). In Chrome, these appear to be enabled on Android and Chrome OS and behind a flag on Linux, Windows, and Mac.

(Note: I'm not sure why you used such an old Chrome version to test this.)

In the current release version (Chrome 91 as of today), UA-CH is enabled by default (since 89, IIRC) on all platforms. Browserleaks isn't a great site to test things with as it sends invalid client hint token names - they're missing the Sec-CH prefix - this was a bug we fixed in M89). Maybe that's why you used 88? (Note: I sent them an email a few months back, but got no response).

Chrome 91 seems to have reduced the headers only to Sec-CH-UA and Sec-CH-UA-Mobile, however. Still, some of the comments below are based on what Chrome 88 exposed (if flag enabled).

These are the default, low-entropy UA Client Hints (aka, sent by default for all requests). M93 also adds Sec-CH-UA-Platform. If you were testing this using Browserleaks, the reason you only saw these are because that site has a bug.

Sec-CH-UA isn’t fully GREASEd: It’s the same in all cases instead of the components changing order at random or the ";Not A Brand";v="99" part varying.

This isn't quite true. In the Chromium implementation, it varies between versions. If you only looked at a single version, I can see how you made that mistake though. We probably could improve the Chromium implementation to more closely match the spec, yes: https://wicg.github.io/ua-client-hints/#create-arbitrary-brands-section.

Sec-CH-UA tries to capture both engine and “brand”, but Sec-CH-UA-Full-Version has just one place for version.

Yep, Microsoft gave some good feedback on this. I plan to address in https://github.com/WICG/ua-client-hints/issues/196.

The meaning and format of Sec-CH-UA-Platform-Version depends on Sec-CH-UA-Platform.

I'm not sure what this means. Recently thoughSec-CH-UA-Platform-Version got some improvements to it in https://github.com/WICG/ua-client-hints/pull/245 that standardizes on format for platform version.

Sec-CH-UA-Arch is reported for Chrome OS despite arguably being pretty useless there.

Right, that's what the spec says to do:

"User Agents MUST return the empty string for model if mobileness is false. User Agents MUST return the empty string for model even if mobileness is true, except on platforms where the model is typically exposed."

Sec-CH-UA-Arch does not indicate 32-bit vs. 64-bit.

Correct, that's captured in the Sec-CH-UA-Bitness hint.

hsivonen commented 3 years ago

I thought I would correct a few mistakes in your post, in case it's useful for anyone else.

Thanks!

Browserleaks isn't a great site to test things with as it sends invalid client hint token names ... Sec-CH-UA-Bitness

Is there a demo site that is up-to-date and shows all Sec-CH-UA-* headers that exist?

rowan-m commented 3 years ago

https://user-agent-client-hints.glitch.me/ is up to date with the proposal.

gsnedders commented 3 years ago

For reference, the Apple WebKit team view had previously been that it was similarly "non harmful" (on the basis that we could always lie in the same ways as the User-Agent header currently does; we would be unlikely to expose anything that we currently do not). As such, the only real advantage over User-Agent is Sec-CH-UA allows for it to be GREASE'd, as well as in the somewhat unlikely-and-very-distant future where we could shorten the User-Agent header we send on every request to reduce bytes over the wire.

hsivonen commented 3 years ago

where we could shorten the User-Agent header we send on every request to reduce bytes over the wire

That could also be solved by having different User-Agent values for different requests: A better-than-present one by default (better either as lower-entropy or shorter or both) and a traditional one in cases required by compat concerns.

As far as bytes on the wire go, if User Agent Client Hints become the norm rather than the exception, they have a lot of potential for more bytes over the wire even with header compression in newer versions of HTTP. Personally, I'm more concerned about getting stuck with both the old and the new verbosity than about the benefit of having the information be more structured.

Also, as discussed under Sec-CH-UA-Full-Version above, I'm worried that making it too structured makes it harder to deploy the kind of compat workarounds that have served browsers well in the past: e.g. claiming to be Netscape AND Gecko AND Safari AND Chrome, like Chrome does.

AFAICT, User Agent Client Hints conflate structuring the data and putting the data behind an explicit request. The FAQ doesn't cover why defaulting to a (mostly-)frozen UA string and explicitly requesting the traditional UA string was rejected as a solution.

The meaning and format of Sec-CH-UA-Platform-Version depends on Sec-CH-UA-Platform.

I'm not sure what this means.

It means that Sec-CH-UA-Platform-Version doesn't make sense on its own. To make sense of its meaning, you have to know what the value of Sec-CH-UA-Platform is. The "format" part referred to the macOS value using underscores instead of periods as the version component separator.

Maybe that's why you used 88?

It was just a matter of delay from time of making the effort to record the header values on multiple devices to the time of posting.