m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Fix thinSnaps #1105

Closed stephen-soltesz closed 1 year ago

stephen-soltesz commented 1 year ago

This change completes fixes the bug reported by @NotSpecial - https://github.com/m-lab/etl/pull/1104 with an update to the unit test to check that every FinalSnapshot matches the last snapshot of the raw, thinned snapshots.


This change is Reviewable

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 7420


Files with Coverage Reduction New Missed Lines %
active/active.go 2 90.63%
<!-- Total: 2 -->
Totals Coverage Status
Change from base Build 7409: 0.05%
Covered Lines: 3323
Relevant Lines: 4942

💛 - Coveralls
stephen-soltesz commented 1 year ago

The parser updated with this fix was deployed to staging around 2022-10-18 20:20:00 - parse times after that have FinalSnapshot and last Snapshots with matching timestamps (notmatching == 0).

SELECT
  date,
  COUNT(*) as total,
  COUNTIF(a.FinalSnapshot.Timestamp != raw.Snapshots[SAFE_ORDINAL(ARRAY_LENGTH(raw.Snapshots))].Timestamp) as notmatching
FROM mlab-staging.ndt.tcpinfo
WHERE
  date between "2021-04-02" AND "2021-05-10"
  and parser.Time > TIMESTAMP("2022-10-18 00:00:00")
GROUP BY date
ORDER BY date 

After deployment to production the historical reprocessing will take about 16 days to cover all dates. Daily data will be update daily.