m-lab / etl-gardener

Gardener provides services for maintaining and reprocessing mlab data.
Apache License 2.0
13 stars 5 forks source link

Production data types have 'max-min' parse times older than historical cycle time #381

Open stephen-soltesz opened 2 years ago

stephen-soltesz commented 2 years ago

In production monitoring, even though gardener processes dates in sync, the datatypes have different max-min parser times.

Screen Shot 2022-06-13 at 9 24 53 PM

This is unexpected. The cause is unknown.

stephen-soltesz commented 2 years ago
SELECT  MIN(parser.Time) mintime, parser.ArchiveURL
from `mlab-oti.ndt_raw.ndt7`
WHERE parser.Time between '2022-04-10' and '2022-05-01'
GROUP by parser.ArchiveURL
order by mintime desc
Row | mintime | ArchiveURL |  
1   | 2022-04-12 00:39:48.663516 UTC | gs://archive-measurement-lab/ndt/ndt7/2021/10/16/20211016T072521.128809Z-ndt7-mlab2-mia05-ndt.tgz |  
2   | 2022-04-11 23:42:51.021691 UTC | gs://archive-measurement-lab/ndt/ndt7/2021/10/14/20211014T050728.240194Z-ndt7-mlab1-sin01-ndt.tgz |  
3   | 2022-04-11 23:17:19.303483 UTC | gs://archive-measurement-lab/ndt/ndt7/2021/10/13/20211013T152757.265738Z-ndt7-mlab1-hnd05-ndt.tgz |  
4   | 2022-04-11 23:14:53.150130 UTC | gs://archive-measurement-lab/ndt/ndt7/2021/10/13/20211013T131052.627372Z-ndt7-mlab3-mnl02-ndt.tgz
$ gsutil cp gs://etl-mlab-oti/ndt/ndt7/2021/10/16/20211016T072521.128809Z-ndt7-mlab2-mia05-ndt.tgz.json .
$ jq '.Parser.Time'  20211016T072521.128809Z-ndt7-mlab2-mia05-ndt.tgz.json  | sort -r

There are no Parser.Time values that match the parser.Time in BQ.

stephen-soltesz commented 2 years ago

Why wouldn't these rows be reset after a full cycle?

Speculation: