m-lab / analysis

Detailed analysis for data collected from M-Lab.
Apache License 2.0
12 stars 5 forks source link

ndt5 tests volume and anomalies #9

Open gfr10598 opened 4 years ago

gfr10598 commented 4 years ago

ndt5 has some oddities...

  1. m-lab/etl-gardener#294 (in September/October 2019)
  2. There are a lot of tests with no S2C and no C2S. This has improved dramatically since May 21
  3. There has been a huge increase in test volume starting in March 17. It has been dropping a bit since then but is still much higher than in early March.

(google only, 8-( ) https://docs.google.com/spreadsheets/d/1wvLj8_4iNGQFrO3PT9ZZR8IrcxQRErO_mTFRHf4zdHE/edit?usp=sharing

Screen Shot 2020-06-26 at 9 10 01 AM

gfr10598 commented 4 years ago

Note that the increase in test volume coincides with a deep weekly variation. Looks like the source of these new tests may be concentrating testing during the week.

gfr10598 commented 4 years ago

SELECT DATE(log_time) AS date, COUNTIF(test_id LIKE '"%') AS tests, COUNTIF(test_id LIKE '"%' AND result.S2C IS NOT NULL AND result.C2S IS NULL) AS downloads, COUNTIF(test_id LIKE '"%' AND result.C2S IS NOT NULL AND result.S2C IS NULL) AS uploads, COUNTIF(test_id LIKE '"%' AND result.S2C IS NOT NULL AND result.C2S IS NOT NULL) AS dual_tests, COUNTIF(test_id LIKE '"%' AND result.S2C IS NULL AND result.C2S IS NULL) AS null_tests, COUNTIF(test_id NOT LIKE '"%') AS unquoted, FROM mlab-sandbox.raw_ndt.ndt5 GROUP BY DATE ORDER BY DATE -- null_tests/tests DESC

gfr10598 commented 4 years ago
clientIP new_tests old_tests
127.0.0.1 1 90276  
35.192.37.249 15 755658

SELECT result.clientIP, COUNTIF(DATE(log_time) = "2020-03-24") AS new_tests, COUNTIF(DATE(log_time) = "2020-03-17") AS old_tests FROM mlab-sandbox.raw_ndt.ndt5 WHERE DATE(log_time) = "2020-03-17" OR log_time = "2020-03-24" GROUP BY clientip ORDER BY new_tests - old_tests DESC

gfr10598 commented 4 years ago
Row clientIP new_tests old_tests  
1 209.91.247.196 6650 0  
2 86.98.51.223 6384 0  
3 2001:5b0:4dc6:7228:6a37:e9ff:fe40:6c9b 2864 0  
4 98.32.229.206 2416 0  
5 2607:fcc8:6c84:9d00:78a9:3056:f78a:61b 1910 0  
6 2607:fcc8:6c84:9d00:c077:9d2a:c456:2a27 1831 0  
7 2607:fcc8:6c84:9d00:912c:adae:fcb9:abcd 1756 0  
8 184.14.195.212 1752 0

WITH new_and_old AS ( SELECT result.clientIP, COUNTIF(DATE(log_time) > "2020-03-16") AS new_tests, COUNTIF(DATE(log_time) < "2020-03-16") AS old_tests FROM mlab-sandbox.raw_ndt.ndt5 WHERE DATE(log_time) BETWEEN "2020-03-02" AND "2020-03-30" AND result.S2C IS NOT NULL AND result.S2C.MeanThroughputMBPS > 1 GROUP BY clientip )

SELECT --(SUM(new_tests)-SUM(old_tests))/14 FROM new_and_old WHERE new_tests > 10old_tests AND new_tests > old_tests + 14 ORDER BY SAFE_DIVIDE(old_tests,new_tests), new_tests DESC

gfr10598 commented 4 years ago

Between March 10 and March 24, the volume of ndt5 test data increased by about 70% and the volume of tcpinfo data increased by just over 50%.

total compressed bytes archive
760809161 gs://archive-measurement-lab/ndt/ndt5/2020/03/10
1048027589 gs://archive-measurement-lab/ndt/ndt5/2020/03/17
1305392571 gs://archive-measurement-lab/ndt/ndt5/2020/03/24
219778621482 gs://archive-measurement-lab/ndt/tcpinfo/2020/03/07
260401181457 gs://archive-measurement-lab/ndt/tcpinfo/2020/03/14
337154953901 gs://archive-measurement-lab/ndt/tcpinfo/2020/03/21
gfr10598 commented 4 years ago

Some stats regarding beacons that have been active for months, vs beacons that became active after March 16. The old beacons seem pretty stable, and the new beacons are significantly faster. The sparse clients are also stable, and significantly slower.

The download speeds are geometric means of per client averages, so that each client is equally weighted, and very fast tests do not have disproportionate influence.

Row type old_download new_download clients old_tests new_tests  
1 old beacons 116.46 116.16 17325 1879785 2261942  
2 new beacons null 34.22 38320 21826 2856274  
3 new sparse null 19.61 11548729 0 19006958  
4 old sparse 20.26 null 7204511 11505953 0

https://console.cloud.google.com/bigquery?sq=581276032543:427422f43ef74b768f0361266695b4df

WITH new_and_old AS ( SELECT result.clientIP, COUNTIF(DATE(log_time) > "2020-03-16") AS new_tests, COUNTIF(DATE(log_time) < "2020-03-16") AS old_tests, SAFE_DIVIDE(SUM(IF(DATE(log_time) > "2020-03-16", result.S2C.MeanthroughputMBPS, 0)), COUNTIF(DATE(log_time) > "2020-03-16")) AS new_download, SAFE_DIVIDE(SUM(IF(DATE(log_time) < "2020-03-16", result.S2C.MeanthroughputMBPS, 0)), COUNTIF(DATE(log_time) < "2020-03-16")) AS old_download, FROM mlab-sandbox.raw_ndt.ndt5 WHERE (DATE(log_time) BETWEEN "2020-03-01" AND "2020-03-14" # Two week period before surge OR --DATE(log_time) BETWEEN "2020-04-01" AND "2020-04-14") # Recent two week period. DATE(log_time) BETWEEN "2020-06-01" AND "2020-06-14") # Recent two week period. AND result.S2C IS NOT NULL AND result.S2C.MeanThroughputMBPS > 1 GROUP BY clientip )

(SELECT "old beacons" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests >= 48 AND new_tests >= 48 # more than 2/day in both time intervals UNION ALL SELECT "new beacons" AS type, NULL as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests < 7 AND new_tests >= 48 # More than 2/day in new interval, less than daily in old interval UNION ALL SELECT "new sparse" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests = 0 AND new_tests < 7 # None earlier, less than daily later UNION ALL SELECT "old sparse" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests < 7 AND new_tests = 0 # None later, less than daily earlier ) ORDER BY new_download DESC

laiyi-ohlsen commented 4 years ago

@stephen-soltesz

stephen-soltesz commented 4 years ago

The increase and eventual drop of "null tests" are due to an e2e monitoring check used with the web100 ndt server to check the "queue". The new platform ndt5 server has no queue, so always succeeded, but also saved an effectively "null" result. In early May, our e2e monitoring was updated to use access tokens and the queue check was removed.

It should be possible to verify this claim by looking at the remote IP for these connections; they will be OAM servers. We have conversational plans to add "filter" predicates in the ETL parser for our datatypes to expose this kind of information. I believe the super majority of these "null tests" are a result of OAM servers, which will eventually be labeled using predicates. We should open an issue there to track this.