Closed twifkak closed 2 years ago
This should be a "denylist" approach -- allowing all user agents except those that match Chromium 73-78.
I found a few UA parsers in crates, in descending popularity:
We should rule out the huge ones, since we have serverless bundle size constraints.
Perhaps a regex will suffice for our needs? Famous last words.
I think a small heuristic without regex should be the way to go. For example, find Chrome/
or Chromium/
then get the next 2 digits to see if it is in [73, 78] range.
Perhaps I'm a bit too forward-looking but there may eventually be a Chrome 730. :wink:
Are you concerned around new dependencies because of bundle size, security, performance, or something else?
In general, I prefer accurate parsers because inaccurate ones lead to weird ecosystem effects:
That said, I'm also a pragmatist. In particular, sxg-rs is a drop in the bucket -- it probably won't materially effect the destiny of User-Agent. :grin: So if a small heuristic is preferable, that's fine.
user-agent-parser, uap-rust and fast_uapperser are all internally using regular expressions defined in uap-core
.
Picking the Chrome-related regex in uap-core
is suffice to parse the Chrome major version.
However, adding regex
crate as dependency increases the gzipped WASM size from 737 kB to 954 kB.
Oh, thanks for the investigation @antiphoton. I overestimated the degree to which UA strings can be accurately parsed. I'll withdraw my earlier comment (except to be mindful of more than two digits).
Maybe something like " Chrome/" or " Chromium/" (note the leading space) followed by any number of digits? No need for regex then, I suppose? std::string or nom will probably suffice?
Chromium M73 enabled SXG, but before this commit landed in M79, it had an Accept header that matches the PrefersSxg logic that is meant to differentiate SXG crawlers from SXG browsers, per this recommendation.
Without User-Agent sniffing to rule out Chromium M73-78, sxg-rs may result in broken experiences on those browsers on sites that use a service worker caching strategy, due to Chromium not interpreting SXG served from a SW. Those browsers represent ~0.19% of users.
(This doesn't happen on Cloudflare Automatic Signed Exchanges because they've implemented this UA sniffing downstream.)