m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Lots of ETL NDT rows have different SegsOut from the legacy rows. #114

Open gfr10598 opened 7 years ago

gfr10598 commented 7 years ago

SELECT legacy_test_id, etl_test_id, legacy_connection_spec_client_hostname, etl_connection_spec_client_hostname from mlab_sandbox.results_20100102 WHERE legacy_web100_log_entry_snap_SegsOut != etl_web100_log_entry_snap_SegsOut

This is likely related to the handling of logs with different numbers of snapshots.

gfr10598 commented 7 years ago

When they differ, the legacy SegsOut count is always greater than the new ETL SegsOut.

gfr10598 commented 7 years ago

Looks like all of the instances where SegsOut is different are cases where the ETL pipeline truncated at 2100 snapshots.

gfr10598 commented 7 years ago

Recently updated max snapshot count to 2800, which should greatly reduce the number of differences compared to legacy. Need to reassess.