Closed yachang closed 4 years ago
Factors to review:
mlab-oti.batch.traceroute
instead of mlab-oti.base_tables.traceroute
?I will do the following things:
sync the traceroute tests from other experiments besides NDT from pusher bucket to archive bucket. And make sure ETL process those tests
compare the NDT tests on new platform, and verify that most of them got corresponding traceroute tests.
How many new-platform S2C tests in September have a traceroute with the same UUID as the s2c channel's UUID?
SELECT COUNT(*) as total
FROM `measurement-lab.ndt.ndt5` as ndt5 INNER JOIN `mlab-oti.base_tables.traceroute` as troute on ndt5.result.s2c.uuid = troute.uuid
WHERE result.S2C IS NOT NULL
AND ndt5.partition_date BETWEEN DATE("2019-09-01") AND DATE("2019-10-01")
AND ndt5.result.S2C.uuid IS NOT NULL
AND ndt5.result.S2C.uuid != ""
AND ndt5.result.Control.uuid IS NOT NULL
AND ndt5.result.Control.uuid != ""
AND EXTRACT(DATE FROM troute._PARTITIONTIME) BETWEEN DATE("2019-09-01") AND DATE("2019-10-01")
AND troute.uuid IS NOT NULL
AND troute.uuid != ""
3,922,466
How many new-platform S2C tests in September have a traceroute with the same UUID as the control channel's UUID?
SELECT COUNT(*) as total
FROM `measurement-lab.ndt.ndt5` as ndt5 INNER JOIN `mlab-oti.base_tables.traceroute` as troute on ndt5.result.control.uuid = troute.uuid
WHERE result.S2C IS NOT NULL
AND ndt5.partition_date BETWEEN DATE("2019-09-01") AND DATE("2019-10-01")
AND ndt5.result.S2C.uuid IS NOT NULL
AND ndt5.result.S2C.uuid != ""
AND ndt5.result.Control.uuid IS NOT NULL
AND ndt5.result.Control.uuid != ""
AND EXTRACT(DATE FROM troute._PARTITIONTIME) BETWEEN DATE("2019-09-01") AND DATE("2019-10-01")
AND troute.uuid IS NOT NULL
AND troute.uuid != ""
3,952,256
How many new-platform S2C tests in September were there total?
SELECT COUNT(*) as total
FROM `measurement-lab.ndt.ndt5` as ndt5
WHERE result.S2C IS NOT NULL
AND ndt5.partition_date BETWEEN DATE("2019-09-01") AND DATE("2019-10-01")
AND ndt5.result.S2C.uuid IS NOT NULL
AND ndt5.result.S2C.uuid != ""
AND ndt5.result.Control.uuid IS NOT NULL
AND ndt5.result.Control.uuid != ""
16,376,595
We expect that a given IP will only have one traceroute performed to it, and we don't cache and return traceroute results (currently filing a bug about that) so these two sets should be non-overlapping. This indicates that (3,922,466 + 3,952,256) / 16,376,595 = 0.48085 = 48% of tests have a corresponding traceroute.
When I rewrote the query to ask "How many S2C test IPs have a corresponding traceroute in September?", using the following query:
SELECT COUNT(*) as total
FROM `measurement-lab.ndt.ndt5` as ndt5 LEFT JOIN `mlab-oti.base_tables.traceroute` as troute on ndt5.result.s2c.ClientIP = troute.Destination.IP
WHERE result.S2C IS NOT NULL
AND ndt5.partition_date BETWEEN DATE("2019-09-01") AND DATE("2019-10-01")
AND ndt5.result.S2C.uuid IS NOT NULL
AND ndt5.result.S2C.uuid != ""
AND ndt5.result.ClientIP IS NOT NULL
AND ndt5.result.ClientIP != ""
AND ndt5.result.Control.uuid IS NOT NULL
AND ndt5.result.Control.uuid != ""
AND EXTRACT(DATE FROM troute._PARTITIONTIME) BETWEEN DATE("2019-09-01") AND DATE("2019-10-01")
AND troute.uuid IS NOT NULL
AND troute.uuid != ""
AND troute.Destination.IP IS NOT NULL
AND troute.Destination.IP != ""
15,244,749
The answer was 15,244,749, and 15,244,749 / 16,376,595 = .930886 = 93%
So, 93% of NDT test IPs had a corresponding traceroute. The sky is definitely not falling, but traceroute-caller has some bugs that can accidentally make it look like the sky is falling. The new platform is still okay.
Added #28 and #29 as follow-up issues.
Fixed; closing it.
I compare the site traffic in 2019/05 (Paris Tracerout) vs. 2019/08 (scamper):
mlab3-den04: 609300, 63800 mlab3-beg01: 660600, 91800 mlab2-mil02: 465900, 149437 mlab2-sea06: 260300, 21000
One possible reason is that scamper test took longer time to complete compared to PT. So the 60 seconds timeout will make most of scamper test fail in the middle.
Attached BigQuery used:
legacySQL
SELECT
COUNT(TestTime) AS num
FROM (
SELECT TestTime
FROM [mlab-oti.batch.traceroute]
WHERE DATE(_PARTITIONTIME) BETWEEN DATE("2019-05-01") AND DATE("2019-06-01") AND uuid = "" AND Parseinfo.TaskFileName CONTAINS "mlab2-sea06"
)
====================
legacySQL
SELECT
COUNT(uuid) AS num
FROM (
SELECT uuid
FROM [mlab-oti.batch.traceroute]
WHERE DATE(_PARTITIONTIME) BETWEEN DATE("2019-08-01") AND DATE("2019-09-01") AND Parseinfo.TaskFileName CONTAINS "mlab2-sea06" )