ndt5 tests volume and anomalies

gfr10598 commented 4 years ago

ndt5 has some oddities...

m-lab/etl-gardener#294 (in September/October 2019)
There are a lot of tests with no S2C and no C2S. This has improved dramatically since May 21
There has been a huge increase in test volume starting in March 17. It has been dropping a bit since then but is still much higher than in early March.

(google only, 8-( ) https://docs.google.com/spreadsheets/d/1wvLj8_4iNGQFrO3PT9ZZR8IrcxQRErO_mTFRHf4zdHE/edit?usp=sharing

Screen Shot 2020-06-26 at 9 10 01 AM

gfr10598 commented 4 years ago

Note that the increase in test volume coincides with a deep weekly variation. Looks like the source of these new tests may be concentrating testing during the week.

gfr10598 commented 4 years ago

SELECT DATE(log_time) AS date, COUNTIF(test_id LIKE '"%') AS tests, COUNTIF(test_id LIKE '"%' AND result.S2C IS NOT NULL AND result.C2S IS NULL) AS downloads, COUNTIF(test_id LIKE '"%' AND result.C2S IS NOT NULL AND result.S2C IS NULL) AS uploads, COUNTIF(test_id LIKE '"%' AND result.S2C IS NOT NULL AND result.C2S IS NOT NULL) AS dual_tests, COUNTIF(test_id LIKE '"%' AND result.S2C IS NULL AND result.C2S IS NULL) AS null_tests, COUNTIF(test_id NOT LIKE '"%') AS unquoted, FROM mlab-sandbox.raw_ndt.ndt5 GROUP BY DATE ORDER BY DATE -- null_tests/tests DESC

gfr10598 commented 4 years ago

clientIP	new_tests	old_tests
127.0.0.1	1	90276
35.192.37.249	15	755658

SELECT result.clientIP, COUNTIF(DATE(log_time) = "2020-03-24") AS new_tests, COUNTIF(DATE(log_time) = "2020-03-17") AS old_tests FROM mlab-sandbox.raw_ndt.ndt5 WHERE DATE(log_time) = "2020-03-17" OR log_time = "2020-03-24" GROUP BY clientip ORDER BY new_tests - old_tests DESC

gfr10598 commented 4 years ago

Row	clientIP	new_tests
1	209.91.247.196	6650
2	86.98.51.223	6384
3	2001:5b0:4dc6:7228:6a37:e9ff:fe40:6c9b	2864
4	98.32.229.206	2416
5	2607:fcc8:6c84:9d00:78a9:3056:f78a:61b	1910
6	2607:fcc8:6c84:9d00:c077:9d2a:c456:2a27	1831
7	2607:fcc8:6c84:9d00:912c:adae:fcb9:abcd	1756
8	184.14.195.212	1752

WITH new_and_old AS ( SELECT result.clientIP, COUNTIF(DATE(log_time) > "2020-03-16") AS new_tests, COUNTIF(DATE(log_time) < "2020-03-16") AS old_tests FROM mlab-sandbox.raw_ndt.ndt5 WHERE DATE(log_time) BETWEEN "2020-03-02" AND "2020-03-30" AND result.S2C IS NOT NULL AND result.S2C.MeanThroughputMBPS > 1 GROUP BY clientip )

SELECT --(SUM(new_tests)-SUM(old_tests))/14 FROM new_and_old WHERE new_tests > 10old_tests AND new_tests > old_tests + 14 ORDER BY SAFE_DIVIDE(old_tests,new_tests), new_tests DESC

gfr10598 commented 4 years ago

Between March 10 and March 24, the volume of ndt5 test data increased by about 70% and the volume of tcpinfo data increased by just over 50%.

total compressed bytes	archive
760809161	gs://archive-measurement-lab/ndt/ndt5/2020/03/10
1048027589	gs://archive-measurement-lab/ndt/ndt5/2020/03/17
1305392571	gs://archive-measurement-lab/ndt/ndt5/2020/03/24
219778621482	gs://archive-measurement-lab/ndt/tcpinfo/2020/03/07
260401181457	gs://archive-measurement-lab/ndt/tcpinfo/2020/03/14
337154953901	gs://archive-measurement-lab/ndt/tcpinfo/2020/03/21

gfr10598 commented 4 years ago

Some stats regarding beacons that have been active for months, vs beacons that became active after March 16. The old beacons seem pretty stable, and the new beacons are significantly faster. The sparse clients are also stable, and significantly slower.

The download speeds are geometric means of per client averages, so that each client is equally weighted, and very fast tests do not have disproportionate influence.

Row	type	old_download	new_download	clients	old_tests	new_tests
1	old beacons	116.46	116.16	17325	1879785	2261942
2	new beacons	null	34.22	38320	21826	2856274
3	new sparse	null	19.61	11548729	0	19006958
4	old sparse	20.26	null	7204511	11505953	0

https://console.cloud.google.com/bigquery?sq=581276032543:427422f43ef74b768f0361266695b4df

WITH new_and_old AS ( SELECT result.clientIP, COUNTIF(DATE(log_time) > "2020-03-16") AS new_tests, COUNTIF(DATE(log_time) < "2020-03-16") AS old_tests, SAFE_DIVIDE(SUM(IF(DATE(log_time) > "2020-03-16", result.S2C.MeanthroughputMBPS, 0)), COUNTIF(DATE(log_time) > "2020-03-16")) AS new_download, SAFE_DIVIDE(SUM(IF(DATE(log_time) < "2020-03-16", result.S2C.MeanthroughputMBPS, 0)), COUNTIF(DATE(log_time) < "2020-03-16")) AS old_download, FROM mlab-sandbox.raw_ndt.ndt5 WHERE (DATE(log_time) BETWEEN "2020-03-01" AND "2020-03-14" # Two week period before surge OR --DATE(log_time) BETWEEN "2020-04-01" AND "2020-04-14") # Recent two week period. DATE(log_time) BETWEEN "2020-06-01" AND "2020-06-14") # Recent two week period. AND result.S2C IS NOT NULL AND result.S2C.MeanThroughputMBPS > 1 GROUP BY clientip )

(SELECT "old beacons" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests >= 48 AND new_tests >= 48 # more than 2/day in both time intervals UNION ALL SELECT "new beacons" AS type, NULL as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests < 7 AND new_tests >= 48 # More than 2/day in new interval, less than daily in old interval UNION ALL SELECT "new sparse" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests = 0 AND new_tests < 7 # None earlier, less than daily later UNION ALL SELECT "old sparse" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests < 7 AND new_tests = 0 # None later, less than daily earlier ) ORDER BY new_download DESC

laiyi-ohlsen commented 4 years ago

@stephen-soltesz

stephen-soltesz commented 4 years ago

The increase and eventual drop of "null tests" are due to an e2e monitoring check used with the web100 ndt server to check the "queue". The new platform ndt5 server has no queue, so always succeeded, but also saved an effectively "null" result. In early May, our e2e monitoring was updated to use access tokens and the queue check was removed.

It should be possible to verify this claim by looking at the remote IP for these connections; they will be OAM servers. We have conversational plans to add "filter" predicates in the ETL parser for our datatypes to expose this kind of information. I believe the super majority of these "null tests" are a result of OAM servers, which will eventually be labeled using predicates. We should open an issue there to track this.

m-lab / analysis

ndt5 tests volume and anomalies #9