mozilla / glam

Mozilla's primary interactive dashboard for examining the distribution of telemetry values.
https://glam.telemetry.mozilla.org
Mozilla Public License 2.0
20 stars 23 forks source link

ADR - Serve data from BQ #2670

Closed edugfilho closed 8 months ago

edugfilho commented 8 months ago

Accepting feedback on this ADR, with code attached.

Moving this forward we probably need to coordinate with DSRE to make sure GLAM has proper BQ permissions.

scholtzan commented 8 months ago

I'd recommend that whatever service account is used for accessing BigQuery only has access to the tables that should be exposed. There seem to be a bunch of tables in glam_etl that might contain data we'd not want users to get access to if somehow the account credentials are leaked or the API accidentally allows access to them somehow.

edugfilho commented 8 months ago

I'd recommend that whatever service account is used for accessing BigQuery only has access to the tables that should be exposed. There seem to be a bunch of tables in glam_etl that might contain data we'd not want users to get access to if somehow the account credentials are leaked or the API accidentally allows access to them somehow.

I agree but I'd like to strike some middle ground there in the likely case we'll need to add more tables to the glam_etl dataset for a new product, for example. In that case I wouldn't like to get DSRE involved, so if BQ supports asterisk for access control that may be a good idea. Or even creating a whole new dataset only for prod tables and allowing GLAM to that dataset only.

edugfilho commented 8 months ago

I initially intended to create a feature flag for this and select a number of users to rollout this to, but it proved too complicated since a full rollback is as simple as switching a flag and re-deploying. So I intend to deploy this in its current state, which means the moment it hits dev all requests in that environment will be fetching data from the BigQuery prod tables, and we can test it out. If things look good then we do stage, then prod.

Rollback is as simple as switching the data_source string in glam/api/views.py back to Postgres and re-deploying.

So my plan is to deploy this to dev on Jan 29.

@mikaeld requesting your input for the above deployment/rollback plan, because if we want to rollback prod it will involve a prod re-deployment. But just button pressing as usual.

Also, we want GLAM to have read permissions to the following tables and eventually other tables (e.g. in case of new products getting added), while taking into account Anna's comment below. I was thinking of a dataset-level access, so it doesn't involve DSRE adding access every time for a new table. Would you have a different recommendation?

I'd recommend that whatever service account is used for accessing BigQuery only has access to the tables that should be exposed. There seem to be a bunch of tables in glam_etl that might contain data we'd not want users to get access to if somehow the account credentials are leaked or the API accidentally allows access to them somehow.

Here are the tables to which GLAM currently needs access:

Desktop (legacy):

Android:

mikaeld commented 8 months ago

Table-level access is preferable to dataset-level. You can use the existing workgroup:dataops-managed/glam in bqetl metadata for all the tables you listed.