mozilla / fxa-activity-metrics

A server for managing the Firefox Accounts metrics database and pipeline
1 stars 3 forks source link

Consider deriving uid using the same key that we use for amplitude #102

Open irrationalagent opened 6 years ago

irrationalagent commented 6 years ago

Right now the uid (1) that we send to amplitude and (2) that is used by sync in their telemetry is derived from a key that's different from the one we use to derive the uid for the flow event data (that we send to redshift and display in re:dash).

The consequence is that we cannot join FxA flow data to sync pings (e.g. in a spark notebook - it wouldn't be possible in re:dash anyway as they exist in different data sources rfkelly-fxa-test and athena/presto, respectively). In theory, we could do an analysis of sync data correlated with flow data in amplitude, but we've elected to put a hold on importing sync data into amplitude (we haven't come up with a good use case that would justify the cost of billions of events).

That said, there are a couple reasons to either (1) change the uid to be consistent with sync/amplitude or (2) add the amplitude version of the uid as another column of data to our flow events.

If we decide to replace the current uid with one that is aligned to amplitude and sync, then any existing re:dash queries that rely on the uid will break. This will affect a few of mine, mostly those that are redundant with similar amplitude dashboards. Would anyone else's queries be seriously affected? @philbooth says no for him.

If we instead decide to add the amplitude-esque uid as an additional field, it will avoid breakage but require more engineering work. It would also add clutter and confusion to the flow events tables for anyone using our data but not in the loop.

Replacing would be the simplest solution. I'm going to inventory my re:dash queries a bit and post back here with an (incomplete) list of some that would be affected.

We could also do nothing. In that case, Lockbox will have to figure out some other solution to connect the dots between interactions in the app and behavior during sign-in/up.

vladikoff commented 6 years ago

"Will probably need this for Lockbox"

vladikoff commented 6 years ago

waiting for Lockbox

shane-tomlinson commented 6 years ago

@irrationalagent - what do we need from Lockbox to allow this to progress?

mhammond commented 6 years ago

I believe desktop recently [1] took the hit of a change in the way the UID is hashed for metrics, taking a short-term hit in metrics for the long term gain of getting amplitude/sync-ping/redash aligned. So I vote a hearty (1) - let's move everything to a new world order.

A part of this I don't yet understand is that IIUC, desktop requires a response from the token-server to be able to report this UID, and that server is unique to Sync, so I don't understand how these UIDs should align. If there was an even more portable solution (causing desktop to take a hit again but towards a greater good), we should do it sooner rather than later.

[1] I can't find the server issue where we discussed the UID hash change - can anyone lend me a clue?

philbooth commented 6 years ago

@mhammond, I think you want this one: mozilla/fxa-amplitude-send#24

philbooth commented 6 years ago

A part of this I don't yet understand is that IIUC, desktop requires a response from the token-server to be able to report this UID, and that server is unique to Sync, so I don't understand how these UIDs should align. If there was an even more portable solution (causing desktop to take a hit again but towards a greater good), we should do it sooner rather than later.

I think we're all good on that front now. The uid that desktop Sync gets from the tokenserver is hashed from the same source that the uid in this issue talks about. Changing the hash key that's used in the Heka filter for these events should result in everything matching up.

irrationalagent commented 6 years ago

If we have Lockbox-independent reasons for this to go forward (i.e. just for alignment's sake), we could get started. The reason I said to wait a week or two is that Lockbox is waiting for review to use this (new-style) fxa uid as its user ID across platforms. I imagine that it will be approved, but if there's something I didn't forsee and we get the 👎 then this won't help us (we'll have to figure something else out). You can read this if you're interested in what we're trying to do