m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Update pcap parser processing rate to 1 in 10 archives #1014

Closed cristinaleonr closed 3 years ago

cristinaleonr commented 3 years ago

Update pcap parser to process fewer archives (i.e., 1 in 10)

A similar concept exists in the legacy pipeline with the TASK_FILE_SKIP setting (e.g., https://github.com/m-lab/etl-gardener/blob/master/k8s/data-processing-cluster/deployments/etl-gardener-ndt.yml#L52)

Testing comparing the processing the logs to the archive entries

Log https://pantheon.corp.google.com/logs/query;query=resource.type%3D%22k8s_container%22%0Aresource.labels.project_id%3D%22mlab-sandbox%22%0Aresource.labels.location%3D%22us-east1%22%0Aresource.labels.cluster_name%3D%22data-processing%22%0Aresource.labels.namespace_name%3D%22default%22%0Alabels.k8s-pod%2Frun%3D%22etl-parser%22%20severity%3E%3DDEFAULT;timeRange=2021-08-12T17:13:17.000Z%2F2021-08-12T17:13:18.000Z;cursorTimestamp=2021-08-12T17:13:17.602104577Z?project=mlab-sandbox Screenshot 2021-08-12 1 46 21 PM

Archive https://pantheon.corp.google.com/storage/browser/archive-measurement-lab/ndt/pcap/2021/07/29?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=measurement-lab&prefix=&forceOnObjectsSortingFiltering=false Screenshot 2021-08-12 2 07 20 PM


This change is Reviewable

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 6599


Totals Coverage Status
Change from base Build 6588: 0.07%
Covered Lines: 3576
Relevant Lines: 5672

💛 - Coveralls
cristinaleonr commented 3 years ago

active/active_test.go, line 112 at r3 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
Please call this "prefix". In the GCS context the "bucket" is the term to describe the thing that contains all of the named objects inside the bucket. The path prefix here is part of naming the full paths for obj1-11.

Done.