Closed irrationalagent closed 3 years ago
Maybe someday we could look into a streaming job that would write these more often as well, but i get that would be a ton of work.
moving the etl should be easy enough
It currently runs at 10:00 UTC, as codified in telemetry-airflow. I will think through the implications of delaying and whether they potentially causes any problems for the KPI dashboard scheduling.
I'd like to know more about the use cases for more "live" data. For your use cases, can you access the sources tables in the fxa-prod
project?
The existing import is designed to run once a day and copy over the entirety of the previous UTC day (which becomes one partition of the destination table) in one load job. We could run this very soon after midnight UTC, but we wait until 10:00 to avoid any potential latency in the stackdriver pipeline of fxa logs into BigQuery.
As PST is UTC-8 and PDT is UTC-7, we already are running this import after midnight Pacific, so this is not a scheduling problem but rather a problem about how our import is oriented towards UTC days.
This pipeline to Amplitude is no longer relevant.
FxA Amplitude uses PDT as its timezone of reference. We are pulling those events into big query once a day. I can't find exactly when, but it looks like before midnight PDT, and likely just after midnight UTC.
This creates an issue that we have to wait an extra day for complete data in BQ if we want to write queries that are framed around PDT - I prefer to do this to keep results comparable to the actual amplitude UI.
Would it cause any problems to move the ETL to sometime shortly after midnight PDT? @relud any thoughts (I think Jeff is still on PTO)