mozilla / probe-scraper

Scrape and publish Telemetry probe data from Firefox
https://mozilla.github.io/probe-scraper/
Mozilla Public License 2.0
21 stars 53 forks source link

[DENG-577] Add glean ping expiry alerts based on retention policy #761

Open BenWu opened 1 month ago

BenWu commented 1 month ago

https://mozilla-hub.atlassian.net/browse/DENG-577

Email alerts for glean pings that are reaching either the collect_through_date or delete_after_days.

WIP, behaviour that needs some discussion:

BenWu commented 1 month ago
  1. Does the "oldest partition" that is being used to determine the age of the dataset get deleted when the data-retention policy forces the oldest data to be deleted?

The oldest partition does get deleted each day and would keep changing which is why I added a 3 day buffer, i.e. don't notify if the partition will be dropped in the next two days since it's likely already being deleted everyday. I'm hoping to keep complexity by not using a state table. Data deletion should happen independent of pipeline issues, but one situation it could send repeat notifications is if copy_deduplicate failed three days in a row so partitions didn't exist.

  1. The wording here in the email templates gives me the impression that my pings are expiring, when actually it's just the oldest data in the dataset or the entire dataset/table right?

Yes, either the oldest data is being deleted or data after a certain date will not be stored. There's a different message for each case so a message might look like this:

app:
    - The "ping-1" ping for will start deleting data older than 180 days starting on 2024-07-04
    - The "ping-2" ping for will for will stop collecting data after 2024-07-04

I'll think a bit more about the wording.

  1. What about downstream consumers of this data that may or may not be the ones getting this notification?

It's certainly possible to tie this into the data catalog to get lineage and owners for derived datasets. The most common case is deleting the oldest data which doesn't affect derived tables unless they get backfilled, in which case it could be surprising to have no data in the upstream table. I'll think about this but it seems high effort vs value, although I'm not in the best position to determine value.

I agree with usage of "ping" but I wasn't sure how else to word it. I'll edit it and try to make it easier to understand.

whd commented 1 month ago

Thanks for working on this.

Meant to run weekly, with alerts when the expiration date is between 3 and 17 days in the future

We had a document with some guidelines about timelines for this sort of thing but I seem to have lost access. I would check with @Marlene-M-Hirose and George's team for feedback on what reasonable values are for these. Anything is better than nothing.

should it only send one alert for the first expiring ping across all channels, send again for pings with retention different from the app default

This sounds right. I'd expect most cases to have most pings with the same retention and we'd want to minimize email spam but ultimately emails per ping per channel is still better than nothing.

need to decide on the email wording and where to direct users who need to update the retention

It's ultimately up to DE how/where we want to put these but https://bugzilla.mozilla.org/show_bug.cgi?id=1717974 and https://mozilla-hub.atlassian.net/browse/DENG-577 are some examples where we increased retention as a result of the alert. This feels like something a data steward should be notified about similar to (presumably) the data steward review when data was first introduced. I don't think we have sufficient metadata to point to that automatically.

sends to emails in ping metadata for custom pings, emails in app metadata for inherited pings

This seems reasonable.

I agree with usage of "ping" but I wasn't sure how else to word it. I'll edit it and try to make it easier to understand.

Perhaps "bigquery table" is the most accurate description since "dataset" can be ambiguous (we're not dealing with BQ Datasets with these notifications).