ua-parser / uap-core

The regex file necessary to build language ports of Browserscope's user agent parser.
Other
749 stars 450 forks source link

Support User Agent Client Hints #452

Open heowc opened 4 years ago

heowc commented 4 years ago

I recently heard about the deprecated UserAgent header. Do you plan to add related content?

See https://wicg.github.io/ua-client-hints/

ZacSadan commented 4 years ago

+1

paz-pm commented 4 years ago

+1

heowc commented 3 years ago

Do you have loadmap this issue? 🤔

nicjansma commented 2 years ago

Hi!

Tagging a few of the maintainers to get their opinion: @commenthol @elsigh @tobie

Does the project have a plan or roadmap for how to incorporate Client Hints (or, to explicitly be blind to Client Hints)?

With Chrome's roadmap for the reduction and freezing of the User-Agent string, we're fast approaching when the Chrome UA will start to change by default:

Starting with Chrome 101+, the UA version will be frozen to NN.0.0.0, and in Chrome 107+ desktop UAs will start lying in the platform (because it will be frozen).

Obviously Chrome isn't the only UA in the world, but it likely accounts for a large % of website traffic.

To drive home how this may affect things in 2023+, this is what a Mac may report (when Client Hints are requested):

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.0.0 Safari/537.36
Sec-CH-UA-Full-Version: "93.0.4577.82"
Sec-CH-UA-Platform: "macOS"
Sec-CH-UA-Platform-Version: "12.0.1"

In the above case, Intel Mac OS X 10_15_7 is hard-coded (even for current newer versions of Mac OSX, such as Monterey 12.0.1), and 93.0.0.0 only provides the major version (instead of 93.0.4577.82). To really understand the user-agent characteristics, you would want to prioritize the Sec-CH-UA-* values over the User-Agent string.

A similar issue already exists today for Windows 11 detection. The Windows 11 UA string (on Edge/Chrome/Opera) will say Windows 10.0 while the Sec-CH-UA-Platform-Version Client Hint will give the correct answer of 13.0.0 (for Windows 11). Today's ua-parser YAML only ever returns Windows 10.

Consumers of libraries like ua-parser will start seeing incorrect data, if they don't incorporate the Sec-CH-UA-* values. What's not clear to me is if the ua-parser library itself should try to do this.

Some ideas for how this library could be updated to support Client Hints:

Proposal 1 - ua-parser identifies frozen UA strings

Proposal 2 - ua-parser consumes and prioritizes Client Hints' data

Proposal 3 - ua-parser is blind to Client Hints

I'm sure there are more ways to approach this, but understanding what this library intends to support may help consumers (like my company) decide if we want to wait for (or build) that support into ua-parser, or just handle Client Hints in our own application.

Some additional resources:

nicjansma commented 2 years ago

One additional benefit of Proposal 2 (ua-parser consumes and prioritizes Client Hints' data) is to help standardize the logic for situations like Windows 11 detection.

Windows 11 will have a Sec-CH-UA-Platform-Version: 13.0.0 (or higher). That 13.x should be "translated" to be Windows 11 logically.

nicjansma commented 2 years ago

@commenthol @elsigh @tobie @romenrg @cherio @mattrobenolt @dmolsen sorry to ping a few of you directly, but I think this issue deserves some attention as the timeline for the Chrome User-Agent to be frozen is fast-approaching:

In 3 months the Chrome Desktop platform will be frozen, as Mobile platform will be next Feb. In those cases, the platform version will be incorrect unless Client Hints are taken into account in some way.

If there aren't any plans to update ua-parser to consume Client Hints (Proposal 3 above) that's fine! But I think at least a documentation update may be warranted to raise awareness.

heowc commented 2 years ago

In the long run, I think Proposition 1 can be inefficient. So I like suggestion 2 more. 👍

commenthol commented 2 years ago

@nicjansma

Thanks for your proposals.

The purpose of this library is to parse the User-Agent Header and returns information on Browser, OS, and brand model. It is unaware of the underlying network protocols and HTTP-Headers.

The Client-Hints approach requires the server to interact with the client using the Accept-CH and maybe others. With this, the returned client values are not required to be parsed by this library any longer. Furthermore, values should be far more accurate than what this library extracts.

This means that ua-parser should not prioritize or judge values from Client-Hints as it can't do that. This would always require some logic outside ua-parser (Proposal-3).

Therefore I would see this interaction with Client-Hints outside the scope of ua-parser. For legacy browsers not supporting Client-Hints I still see ua-parser as a valuable, but fading, source.

nicjansma commented 2 years ago

That's a fair assessment @commenthol. I would suggest a documentation update for uap-core and/or some of the reference libraries then, possibly pointing to this issue or a brief summary of how a consumer of ua-parser should also incorporate Client Hints.

I can help with that type of documentation if it would be accepted into the README.

commenthol commented 2 years ago

HI @nicjansma , Happy to hear that the proposed is well received. Yes please feel free to create a PR with your recommendations on how clients might use uap-core together with client hints. I'd suggest to create a separate document in ./docs and link this from the README.md as I can imagine that the recommendation might require their space. What do you think?

untitaker commented 2 years ago

I think there is definitely merit in a library that takes a list of http headers, parses all information possible out of it and returns the data in exactly the same format as uap-core-based libraries do. This would be very useful to build drop-in replacements that have to return the same exact browser names etc as uap-core has.

I am not saying this has to be part of uap-core, and there is definitely more setup required from the developer to get the same amount of information (requesting high-entropy fields etc), but this could be a piece of logic that can be factored out.