m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

scamper1 parsing errors #1052

Open stephen-soltesz opened 2 years ago

stephen-soltesz commented 2 years ago

After deploying the new alternative ETL pipeline SLIs, we found that the scamper1 datatype would report parse errors after restarting:

Screen Shot 2022-02-07 at 2 24 28 PM

We suspected this may be due to a temporary format error in the early version, but have not yet confirmed.

After deploying both a modified version of the v1 parser for the "traceroute"/paris1 datatype and after copying the ndt/traceroute data to ndt/scamper1, the error rate is about 20%

Screen Shot 2022-02-07 at 2 24 45 PM

The causes of both is currently unknown, but the parser should recognize these as either "invalid" measurements (to not include them in the set of measurements used to calculate the error rate) or to fix the parser to recognize these files.

Update: A lot of files containing only a UUID field have been found in the legacy scamper archive. {"UUID": "ndt-v97k9_1555519948_0000000000005586"} For example, the files under https://pantheon.corp.google.com/storage/browser/_details/archive-measurement-lab/ndt/traceroute/2019/04/21/20190422T003934.994993Z-traceroute-mlab4-lga05-ndt.tgz;tab=live_object?project=measurement-lab

These seem to be triggering the majority of the "invalid traceroute file" errors.

When looking at the error rate of the legacy scamper parser, it often reaches extremely high levels. Screenshot 2022-02-07 7 07 32 PM

The fix is to filter out the legacy dates in the errors returned by the the scamper1 parser.