Study suggestion: Survey influence of tails and CG-Nat

We may be able to get some insights into distributions within IP addresses, and their influence on aggregate distributions.

Compute CDFs on 5 or 10 bins per decade. Compute at a modest aggregation level, e.g. state, county, or possibly city. Use a fairly large time interval, like 3 months or a year. a. Include all tests from all clients. a. Use 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, 99th percentile per IP. a. Repeat excluding hottest IPs, with > 2 tests/day. a. Repeat with only IPs that have very few tests, less than 3 per week. a. Repeat with only IPs that have frequent tests - more than 3 per week. a. Repeat with only hot IPs - those with more than 2 tests/day.

Repeat using WScale or CWnd to distinguish clients within an IP address?

Repeat for individual ASN or ISP. Some will have higher rates of CG-Nat than others.

Compare US vs EU vs later internet adopters.

With cold IPs, there should be little spread between percentiles, because most IPs with have only 1 or 2 tests.

With warm and hot IPs, the spread will be greater if there are multiple clients per IP, less if there is very little CG_NAT influence.

Possibly repeat, but use ratios, e.g. of 5th percentile and median.

m-lab / analysis