m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Decommission v1 data pipeline #1074

Closed stephen-soltesz closed 2 years ago

stephen-soltesz commented 2 years ago
stephen-soltesz commented 2 years ago

https://github.com/m-lab/prometheus-support/releases/tag/v2.56.0 https://github.com/m-lab/etl-schema/releases/tag/v3.41.0

stephen-soltesz commented 2 years ago

From mlab-staging unified downloads daily count - verifying the web100_static update is WAI.

Screen Shot 2022-04-28 at 12 36 48 PM
stephen-soltesz commented 2 years ago

From measurement-lab unified uploads after deployment to production/public views.

Screen Shot 2022-04-28 at 6 28 08 PM
stephen-soltesz commented 2 years ago

Evidently, the v2 pipeline parsers in production are using the default value of the annotatorURL: https://github.com/m-lab/etl/blob/master/cmd/etl_worker/etl_worker.go#L71 which targets mlab-sandbox by default...

These requests should be no-ops (just burned cycles) and corrected by https://github.com/m-lab/etl/pull/1078 But, this was unexpected/unintended cross-project configuration and must be resolved before deleting the annotation-service in sandbox.

To complete:

stephen-soltesz commented 2 years ago

I have confirmed that the only IPs accessing the annotation-service according to the request logs are from GKE nodes in mlab-oti. No other requests are directed to the /batch_annotation resource.

https://console.cloud.google.com/logs/query;query=resource.type%3D%22gae_app%22%0Aresource.labels.module_id%3D%22annotator%22%0A-httpRequest.remoteIp%3D%2234.133.233.38%22%0A-httpRequest.remoteIp%3D%2234.121.84.25%22%0A-httpRequest.remoteIp%3D%2234.70.116.43%22%0A-httpRequest.remoteIp%3D%22104.197.140.193%22%0A-httpRequest.remoteIp%3D%2234.122.168.53%22%0A-httpRequest.remoteIp%3D%2235.238.247.201%22%0A-httpRequest.remoteIp%3D%2234.123.117.130%22%0A-httpRequest.remoteIp%3D%2234.69.127.59%22%0A;cursorTimestamp=2022-05-11T19:42:16.301Z?project=mlab-sandbox

stephen-soltesz commented 2 years ago

After deploying https://github.com/m-lab/etl/pull/1078, the production v2 pipeline is no longer targeting the mlab-sandbox annotation-service.

Screen Shot 2022-05-12 at 10 36 39 AM

There are no requests to the annotation service in mlab-staging or mlab-oti either. So, it should be safe to delete this service now..

stephen-soltesz commented 2 years ago

After staging deployment of above PRs, I've confirmed downloader_last_success_time_seconds is still in staging prometheus after deleting the old downloader from data-processing-cluster. Also confirmed updates in gs://downloader-mlab-staging.

stephen-soltesz commented 2 years ago

Since deleting the v1 data pipeline components, the burn rate has reduced ~$6k/day

Screen Shot 2022-05-19 at 10 39 37 PM
stephen-soltesz commented 2 years ago

\o/ - Fin.