snowplow / enrich

Snowplow Enrichment jobs and library
https://snowplowanalytics.com
Other
21 stars 38 forks source link

Common: Update Yauaa to 7.x and add support for parsing Client Hints #639

Closed nielsbasjes closed 1 year ago

nielsbasjes commented 2 years ago

With version 7.0.0 the Yauaa User-Agent analyzer now has support for also parsing the User-Agent Client Hints. See https://yauaa.basjes.nl/using/clienthints/

In addition to only providing a User-Agent value it is now possible to provide a Map<String, String> with the values for the relevant request headers (i.e. headers like User-Agent and Sec-CH-UA-Mobile) and get a more accurate analysis result (where possible).

istreeter commented 2 years ago

Thanks @nielsbasjes! At first I assumed this meant just bump the library to 7.x and do nothing else. But I see now there is an opportunity for snowplow to do something bigger and better than that, to take advantage of this new yauaa feature.

Currently, we call ua.parse(userAgent) here where userAgent is a simple string taken from the headers. But we should change to instead to pass in a map comprising all useragent and client-hint headers as described in these docs.

I assume we don't need to also make a collector change to request these hints from the browser. The docs says:

By default the browsers that support this will send the “low entropy” values without the need to do anything special (other than going over https).

So to start with I think we can do this without a collector change, and still get the benefit of passing in the client hints. It's interesting to consider if in future we could change the collector to ask for any higher entropy client hint.

paulboocock commented 2 years ago

The JS Tracker has a feature to read the Client Hints from the Browser Client Hints API: https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/javascript-trackers/browser-tracker/browser-tracker-v3-reference/plugins/client-hints/

We could perhaps leverage this captured schema, although I imagine getting the headers directly in the request will make this more seemless for end users (and I think there's more info available via headers than the JS API).

istreeter commented 2 years ago

@paulboocock I'm veering off topic here, but we could probably write an enrichment to add org.ietf/http_client_hints as a derived context, instead of doing it in the tracker. Looks like it's just a matter of mapping headers to fields.

nielsbasjes commented 2 years ago

@istreeter Yes this is correct. I see the following steps:

  1. You can just update Yauaa to the latest version. This will give you improved analysis on various aspects. The API is backwards compatible so this should "just work".
  2. Any browser that supports it (currently most Chromium derivatives, Brave doesn't) will send the low entropy headers by default.
  3. If a website asks for it (by means of setting a response header) the browser will (in general: if allowed) send the requested additional headers on all next requests.
  4. Once you have these extra fields you can feed them into Yauaa and get a better result.