mozilla / bigquery-etl

Bigquery ETL
https://mozilla.github.io/bigquery-etl
Mozilla Public License 2.0
255 stars 101 forks source link

Consider moving FxA amplitude event export to after midnight PDT #174

Closed irrationalagent closed 3 years ago

irrationalagent commented 5 years ago

FxA Amplitude uses PDT as its timezone of reference. We are pulling those events into big query once a day. I can't find exactly when, but it looks like before midnight PDT, and likely just after midnight UTC.

This creates an issue that we have to wait an extra day for complete data in BQ if we want to write queries that are framed around PDT - I prefer to do this to keep results comparable to the actual amplitude UI.

Would it cause any problems to move the ETL to sometime shortly after midnight PDT? @relud any thoughts (I think Jeff is still on PTO)

irrationalagent commented 5 years ago

Maybe someday we could look into a streaming job that would write these more often as well, but i get that would be a ton of work.

relud commented 5 years ago

moving the etl should be easy enough

jklukas commented 5 years ago

It currently runs at 10:00 UTC, as codified in telemetry-airflow. I will think through the implications of delaying and whether they potentially causes any problems for the KPI dashboard scheduling.

I'd like to know more about the use cases for more "live" data. For your use cases, can you access the sources tables in the fxa-prod project?

jklukas commented 5 years ago

The existing import is designed to run once a day and copy over the entirety of the previous UTC day (which becomes one partition of the destination table) in one load job. We could run this very soon after midnight UTC, but we wait until 10:00 to avoid any potential latency in the stackdriver pipeline of fxa logs into BigQuery.

As PST is UTC-8 and PDT is UTC-7, we already are running this import after midnight Pacific, so this is not a scheduling problem but rather a problem about how our import is oriented towards UTC days.

jklukas commented 3 years ago

This pipeline to Amplitude is no longer relevant.