m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

tcpinfo: snapshot thinning metadata: raw snapshot length #1106

Open stephen-soltesz opened 1 year ago

stephen-soltesz commented 1 year ago

For unit testing and data analysis, it would be helpful to know what the original length of the raw snapshots was. Today there is no indication that snapshots were thinned nor by how much.

If this information were available, then the tcpinfo snapshot thinning unit tests could additionally verify that test data had the expected number of snapshots remaining after "thinning".

For example: a.SnapshotsLength, a.TotalSnapshots, etc. This update would require a schema update, but would be available for all historical data since it is a derived metric.