mozilla / application-services

Firefox Application Services
https://mozilla.github.io/application-services/
Other
608 stars 225 forks source link

FxA: return the hashed fxa uid as part of the account information #434

Closed irrationalagent closed 3 years ago

irrationalagent commented 5 years ago

related to https://github.com/mozilla/application-services/issues/19

For the Lockbox mobile apps, we'd like for the fxa client to be able to obtain the hashed_uid used for metrics (currently only obtainable from the sync token server) so that it can be included in the telemetry sent out by the app. It would enable joining of sync telemetry and Lockbox app telemetry (and any other app that uses the rust client).

I believe this would also entail some server side work as well but its unclear to me what endpoint the hashed_uid would be best exposed at.

┆Issue is synchronized with this Jira Story

thomcc commented 5 years ago

CC @rfk

rfk commented 5 years ago

Lockbox already talks to the sync tokenserver, so a short-term option could be to expose the hashed_uid that we already obtain from there, rather than adding a new server endpoint. Obviously we can't assume that for all future client apps though.

@irrationalagent could you say more about how you see this being used by the lockbox app? I'm imagining the app doing something like:

And then storing it to submit in telemetry pings during ongoing use of the app.

We need to be super clear about the security properties of the hashed_uid as well. Should any app be able to request it at any time? Does that add additional risk that, say, a db breach on AMO or Pocket could leak tokens that can be used to fetch hashed_uid values and de-anonymize telemetry data?

/cc @shane-tomlinson

shane-tomlinson commented 5 years ago

I imagine a similar scheme to Ryan, right now the user's unhashed uid can already be already obtained. If we able to return both the uid and hashed_uid from one service it seems like the privacy protections offered by hashing the uid are pretty minimal.

It would enable joining of sync telemetry and Lockbox app telemetry (and any other app that uses the rust client).

Two high level questions - (1) Have the data stewards and legal approved joining metrics across services, and (2) how does this fit in with your ecosystem_id proposal?

irrationalagent commented 5 years ago

We can talk more about this in the meeting today, but:

(1) they have signed off on this for the lockbox case specifically. We are sending some desktop measurements in the sync ping already and this is the last piece that would allow us to join desktop + mobile app telemetry

(2) It does and it doesn't - the goal is the same but this was an initiative that we started a while back when we were just in the brainstorming stages for the ecosystem stuff. Ideally we would move to that when its done, but we'd like to get this working in the meantime.

rfk commented 5 years ago

In that case, I think we should do this option in service of the just-lockbox usecase:

Lockbox already talks to the sync tokenserver, so a short-term option could be to expose the hashed_uid that we already obtain from there, rather than adding a new server endpoint.

@thomcc is there an obvious point at which the sync tokenserver code could spit this out for consumption by the lockbox app, or is that currently quite hidden away under the layers of the API?

rfk commented 5 years ago

/cc @linuxwolf

mhammond commented 5 years ago

Related to #19

irrationalagent commented 5 years ago

@thomcc when you get a chance could you weigh in on Ryan's comment?

thomcc commented 5 years ago

@thomcc is there an obvious point at which the sync tokenserver code could spit this out for consumption by the lockbox app, or is that currently quite hidden away under the layers of the API?

It would be a little awkward but probably not too bad. The only real issue is around the method failing in cases where we haven't synced/synced recently (we don't keep as much sync state as desktop does in memory across syncs).

(Also sorry for not noticing this)

mhammond commented 5 years ago

return both the uid and hashed_uid from one service it seems like the privacy protections offered by hashing the uid are pretty minimal.

Sorry that I'm late to this party, but I don't see the privacy concerns here - eg, desktop already knows both IDs. The privacy we want is for data submitted in telemetry pings - just having a ping should not be enough to identify the user from which it originated.

Getting this hashed id directly from FxA would also solve an issue we have today - that a failure before (or due to) hitting the token server means we don't have this ID when submitting telemetry.

(That said though, I believe we can arrange for embedding apps to get a "sync ping" after each sync, and they can choose what to do with it, which would hopefully involve submitting it as a "real" sync ping, and possibly extracting some data from inside it to form the basis of any additional telemetry data they wish to submit)

rfk commented 5 years ago

FWIW I have no particular objection to adding an API on FxA that takes, say, an OAuth token with 'oldysnc' scope and returns the hashed telemetry UID; tokensever already offers that functionality so we wouldn't be moving the privacy boundary.

Getting this hashed id directly from FxA would also solve an issue we have today - that a failure before (or due to) hitting the token server means we don't have this ID when submitting telemetry.

Unless the hashed uid gets returned inline in the OAuth response data, moving from "request it from tokenserver" to "request it in a separate API call to FxA" doesn't seem like it will help much with this problem, as there's still the possibility of that API call failing.

Maybe it could be fetched as part of their profile data bundle rather than in a separate call?

mhammond commented 5 years ago

Maybe it could be fetched as part of their profile data bundle rather than in a separate call?

I'm not as familiar with the oauth flows as I should be, but yes, that's exactly what I had in mind.

mhammond commented 5 years ago

We've landed core support for extracting the uid and other sync-specific telemetry data, and I opened #524 to expose that to consumers, so I'm closing this in favour of that.

(There's still a case to be made that FxA should supply the fxa uid in a more convenient way as a (possibly significant) limitation in grabbing it from the token server is that consumers will not know the uid until we've successfully connected to the token server, but we can have that discussion later)

rfk commented 5 years ago

Longer-term I believe the plan is to submit "ecosystem telemetry" pings with the raw FxA uid and have it scrubbed on the server on ingestion, so obtaining it from FxA vs tokenserver may be moot.

mhammond commented 5 years ago

I closed this prematurely - we do want our consumers to have access to the hashed fxa uid without needing to perform a sync.

rfk commented 4 years ago

see also https://bugzilla.mozilla.org/show_bug.cgi?id=1584356

jdragojevic commented 3 years ago

Closing this as something we never acted on.