m-lab / analysis

Detailed analysis for data collected from M-Lab.
Apache License 2.0
12 stars 5 forks source link

Study suggestion: Survey influence of tails and CG-Nat #16

Open gfr10598 opened 3 years ago

gfr10598 commented 3 years ago

We may be able to get some insights into distributions within IP addresses, and their influence on aggregate distributions.

  1. Compute CDFs on 5 or 10 bins per decade. Compute at a modest aggregation level, e.g. state, county, or possibly city. Use a fairly large time interval, like 3 months or a year. a. Include all tests from all clients. a. Use 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, 99th percentile per IP. a. Repeat excluding hottest IPs, with > 2 tests/day. a. Repeat with only IPs that have very few tests, less than 3 per week. a. Repeat with only IPs that have frequent tests - more than 3 per week. a. Repeat with only hot IPs - those with more than 2 tests/day.

Repeat using WScale or CWnd to distinguish clients within an IP address?

Repeat for individual ASN or ISP. Some will have higher rates of CG-Nat than others.

Compare US vs EU vs later internet adopters.

With cold IPs, there should be little spread between percentiles, because most IPs with have only 1 or 2 tests.

With warm and hot IPs, the spread will be greater if there are multiple clients per IP, less if there is very little CG_NAT influence.

Possibly repeat, but use ratios, e.g. of 5th percentile and median.

gfr10598 commented 3 years ago

FYI - maybe related - Google Video Quality Report discusses their use of 90th percentile as the threshold for rating ISPs. If 90% of streams are HD, then the ISP is rated HD. This might inform a future decision to use tail stats instead of medians or averages in some of our recommendations.