Open gfr10598 opened 4 years ago
Note that the increase in test volume coincides with a deep weekly variation. Looks like the source of these new tests may be concentrating testing during the week.
SELECT DATE(log_time) AS date, COUNTIF(test_id LIKE '"%') AS tests,
COUNTIF(test_id LIKE '"%' AND result.S2C IS NOT NULL AND result.C2S IS NULL) AS downloads,
COUNTIF(test_id LIKE '"%' AND result.C2S IS NOT NULL AND result.S2C IS NULL) AS uploads,
COUNTIF(test_id LIKE '"%' AND result.S2C IS NOT NULL AND result.C2S IS NOT NULL) AS dual_tests,
COUNTIF(test_id LIKE '"%' AND result.S2C IS NULL AND result.C2S IS NULL) AS null_tests,
COUNTIF(test_id NOT LIKE '"%') AS unquoted,
FROM mlab-sandbox.raw_ndt.ndt5
GROUP BY DATE
ORDER BY DATE -- null_tests/tests DESC
clientIP | new_tests | old_tests | |
---|---|---|---|
127.0.0.1 | 1 | 90276 | |
35.192.37.249 | 15 | 755658 |
SELECT result.clientIP,
COUNTIF(DATE(log_time) = "2020-03-24") AS new_tests,
COUNTIF(DATE(log_time) = "2020-03-17") AS old_tests
FROM mlab-sandbox.raw_ndt.ndt5
WHERE DATE(log_time) = "2020-03-17" OR log_time = "2020-03-24"
GROUP BY clientip
ORDER BY new_tests - old_tests DESC
Row | clientIP | new_tests | old_tests | |
---|---|---|---|---|
1 | 209.91.247.196 | 6650 | 0 | |
2 | 86.98.51.223 | 6384 | 0 | |
3 | 2001:5b0:4dc6:7228:6a37:e9ff:fe40:6c9b | 2864 | 0 | |
4 | 98.32.229.206 | 2416 | 0 | |
5 | 2607:fcc8:6c84:9d00:78a9:3056:f78a:61b | 1910 | 0 | |
6 | 2607:fcc8:6c84:9d00:c077:9d2a:c456:2a27 | 1831 | 0 | |
7 | 2607:fcc8:6c84:9d00:912c:adae:fcb9:abcd | 1756 | 0 | |
8 | 184.14.195.212 | 1752 | 0 |
WITH new_and_old AS (
SELECT result.clientIP,
COUNTIF(DATE(log_time) > "2020-03-16") AS new_tests,
COUNTIF(DATE(log_time) < "2020-03-16") AS old_tests
FROM mlab-sandbox.raw_ndt.ndt5
WHERE DATE(log_time) BETWEEN "2020-03-02" AND "2020-03-30"
AND result.S2C IS NOT NULL
AND result.S2C.MeanThroughputMBPS > 1
GROUP BY clientip
)
SELECT --(SUM(new_tests)-SUM(old_tests))/14 FROM new_and_old WHERE new_tests > 10old_tests AND new_tests > old_tests + 14 ORDER BY SAFE_DIVIDE(old_tests,new_tests), new_tests DESC
Between March 10 and March 24, the volume of ndt5 test data increased by about 70% and the volume of tcpinfo data increased by just over 50%.
total compressed bytes | archive |
---|---|
760809161 | gs://archive-measurement-lab/ndt/ndt5/2020/03/10 |
1048027589 | gs://archive-measurement-lab/ndt/ndt5/2020/03/17 |
1305392571 | gs://archive-measurement-lab/ndt/ndt5/2020/03/24 |
219778621482 | gs://archive-measurement-lab/ndt/tcpinfo/2020/03/07 |
260401181457 | gs://archive-measurement-lab/ndt/tcpinfo/2020/03/14 |
337154953901 | gs://archive-measurement-lab/ndt/tcpinfo/2020/03/21 |
Some stats regarding beacons that have been active for months, vs beacons that became active after March 16. The old beacons seem pretty stable, and the new beacons are significantly faster. The sparse clients are also stable, and significantly slower.
The download speeds are geometric means of per client averages, so that each client is equally weighted, and very fast tests do not have disproportionate influence.
Row | type | old_download | new_download | clients | old_tests | new_tests | |
---|---|---|---|---|---|---|---|
1 | old beacons | 116.46 | 116.16 | 17325 | 1879785 | 2261942 | |
2 | new beacons | null | 34.22 | 38320 | 21826 | 2856274 | |
3 | new sparse | null | 19.61 | 11548729 | 0 | 19006958 | |
4 | old sparse | 20.26 | null | 7204511 | 11505953 | 0 |
https://console.cloud.google.com/bigquery?sq=581276032543:427422f43ef74b768f0361266695b4df
WITH new_and_old AS (
SELECT result.clientIP,
COUNTIF(DATE(log_time) > "2020-03-16") AS new_tests,
COUNTIF(DATE(log_time) < "2020-03-16") AS old_tests,
SAFE_DIVIDE(SUM(IF(DATE(log_time) > "2020-03-16", result.S2C.MeanthroughputMBPS, 0)),
COUNTIF(DATE(log_time) > "2020-03-16")) AS new_download,
SAFE_DIVIDE(SUM(IF(DATE(log_time) < "2020-03-16", result.S2C.MeanthroughputMBPS, 0)),
COUNTIF(DATE(log_time) < "2020-03-16")) AS old_download,
FROM mlab-sandbox.raw_ndt.ndt5
WHERE
(DATE(log_time) BETWEEN "2020-03-01" AND "2020-03-14" # Two week period before surge
OR
--DATE(log_time) BETWEEN "2020-04-01" AND "2020-04-14") # Recent two week period.
DATE(log_time) BETWEEN "2020-06-01" AND "2020-06-14") # Recent two week period.
AND result.S2C IS NOT NULL
AND result.S2C.MeanThroughputMBPS > 1
GROUP BY clientip
)
(SELECT "old beacons" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests >= 48 AND new_tests >= 48 # more than 2/day in both time intervals UNION ALL SELECT "new beacons" AS type, NULL as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests < 7 AND new_tests >= 48 # More than 2/day in new interval, less than daily in old interval UNION ALL SELECT "new sparse" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests = 0 AND new_tests < 7 # None earlier, less than daily later UNION ALL SELECT "old sparse" AS type, ROUND(EXP(AVG(LN(old_download))),2) as old_download, ROUND(EXP(AVG(LN(new_download))),2) as new_download, COUNT(clientIP) AS clients, SUM(old_tests) AS old_tests, SUM(new_tests) AS new_tests FROM new_and_old WHERE old_tests < 7 AND new_tests = 0 # None later, less than daily earlier ) ORDER BY new_download DESC
@stephen-soltesz
The increase and eventual drop of "null tests" are due to an e2e monitoring check used with the web100 ndt server to check the "queue". The new platform ndt5 server has no queue, so always succeeded, but also saved an effectively "null" result. In early May, our e2e monitoring was updated to use access tokens and the queue check was removed.
It should be possible to verify this claim by looking at the remote IP for these connections; they will be OAM servers. We have conversational plans to add "filter" predicates in the ETL parser for our datatypes to expose this kind of information. I believe the super majority of these "null tests" are a result of OAM servers, which will eventually be labeled using predicates. We should open an issue there to track this.
ndt5 has some oddities...
(google only, 8-( ) https://docs.google.com/spreadsheets/d/1wvLj8_4iNGQFrO3PT9ZZR8IrcxQRErO_mTFRHf4zdHE/edit?usp=sharing