whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
7.99k stars 2.61k forks source link

Discussing new ways of tackling User-Agent discrimination #10518

Open BenjaminAster opened 1 month ago

BenjaminAster commented 1 month ago

What problem are you trying to solve?

The User-Agent header has long been used to discriminate against browsers that are fully capable of rendering a website, but are not allowed by the website because the browser's User-Agent string does not match that of one of the website's "supported browsers". This has lead to browsers lying about who they are and creating monsters like Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36 EdgA/127.0.0.0 (the UA string of Edge on Android).

To this day, User-Agent discrimination is still a big thing. New browser engines like LibWeb/Ladybird or Servo have this problem where they are capable of rendering a site, but get a stripped-down version or a "browser not supported" message, even on big sites like google.com or x.com. Even "big" browsers like Firefox are still often discriminated against (see next paragraph) and keep a list of "bad" sites where they report the Chrome User-Agent string instead of the actual one (in Firefox, this list is visible in about:compat). Apple Maps (beta.maps.apple.com) is a good example of a site by a big corporation that works just fine in Firefox with a spoofed UA string, but shows a "browser not supported" error there otherwise. Apple Maps specifically goes to an extent where it is also not usable in Google Chrome on Linux, simply because Linux is not one of the "supported operating systems", which is, of course, complete nonsense since it would work just fine on Linux, too.

Furthermore, Google's introduction of User-Agent Client Hints in Chromium has made matters even worse: Some sites (even by multi-billion-dollar corporations) now check the browser list specifically to include "Google Chrome" or "Microsoft Edge" and refuse to work otherwise. E.g. digits.t-mobile.com is just one example of a site that:

Vivaldi, for example, has tackled this issue by simply pretending to be Google Chrome, and not give websites any official way of detecting that the browser is Vivaldi, which made all these problems go away, but it also means that browser detection for "good" use cases like analytics or browser-specific user instructions is not possible any more. Firefox also has recently included a patch where they create a "fake" navigator.userAgentData object on digits.t-mobile.com and some other sites to report being Google Chrome, which made these sites work again in Firefox.

What solutions exist today?

As mentioned above, User-Agent Client Hints were introduced by Google to "tackle" the problem of User-Agent discrimination, but arguably made matters even worse because some websites now do not just check for "Chrome/" to be included in the UA string (which is the case for all Chromium-based browsers), but specifically want the UA Client Hints brands list to include "Google Chrome" or "Microsoft Edge".

How would you solve it?

I propose an admittedly radical idea to this problem that will need a lot of compatibility testing and discussion and would be a very big step, but I think it could be a way of reasonably solving all of the problems mentioned above: Make all browsers report "Google Chrome on Windows/Android" by default, and provide a way for websites to opt-in to real browser information by including a meta tag/header that includes a token that is cryptographically signed by some authority (e.g. W3C?) specifically for a certain origin, in a way very similar to Chrome Origin Trials. Specifically, this would mean:

Anything else?

The point I'm trying to make with this is that developers who discriminate against fully-capable browsers are, for the most part, not "evil" people, they are just lazy and/or incompetent and are simply not educated about the enormous harm that user-agent discrimination has done to browser competition in the past decades. I believe that by "signing" the "terms of service" of getting such a true-browser-info-token, they are forced to be educated about all of this. And if there is a way to make all this legally binding, browser makers that would still get discriminated against by websites that include the token could perhaps even sue the website makers.

justjake commented 1 month ago

Unfortunately there’s plenty of user-agent differences that you cannot reasonably detect by feature sniffing. We’ve often used user-agent logic to work around bugs or normalize rendering across browsers.

Examples:

BenjaminAster commented 1 month ago

@justjake Yes, but I think these are truly valid use cases for browser detection via the UA string, so for these cases you could still request the token to get the real browser info.

miketaylr commented 1 month ago

Make all browsers report "Google Chrome on Windows/Android" by default, and provide a way for websites to opt-in to real browser information

What you're describing is effectively possible via User-Agent Client Hints - this is sort of the point. The user-agent header could be set to any value (Chrome on Windows/Android), and the website can get the "real" values via UA-CH. But as you mentioned already - there will be a lot of compatibility issues which will likely discourage any browser from going to this extreme.

Some sites (even by multi-billion-dollar corporations) now check the browser list specifically to include "Google Chrome" or "Microsoft Edge" and refuse to work otherwise.

UA-CH proposes to solve this via the brand list - it's perfectly valid for a browser to send "Google Chrome" (or anything else), in addition to its real brand in navigator.userAgentData.brands. That solves for "lazy" sites, and additional use cases such as analytics and bugfix workarounds.