ooni / explorer

OONI Explorer: uncover evidence of internet censorship worldwide
https://explorer.ooni.org
BSD 3-Clause "New" or "Revised" License
71 stars 38 forks source link

Update coverage chart color scale and add a legend #823

Closed majakomel closed 1 year ago

majakomel commented 1 year ago

Closes https://github.com/ooni/explorer/issues/771

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
explorer ✅ Ready (Inspect) Visit Preview Jan 13, 2023 at 11:43AM (UTC)
hellais commented 1 year ago

I did a bit of plots to figured out what might be ideal ranges to use for displaying the data.

In particular I used the daily measurement count from the month of December 2022 and looked at the distribution of the mean, median and max.

download (4) download (2) download (1)

I think if we are to keep only 4 ranges of values, we should use the following interval:

measurements > 5000: 68
measurements 500-5000: 763
measurements 50-500: 1223
measurements < 50: 755

The count on the right here indicates how many ASNs have a median daily count of measurements within the range.

Maybe, though, we should expand it to 5 ranges of values and use these:

measurements > 10000: 29
measurements 1000-10000: 503
measurements 100-1000: 1097
measurements 10-100: 774
measurements < 10: 389

For reference this is the distribution of ASNs with the current ranges:

measurements > 1000: 532
measurements 100-1000: 1097
measurements 10-100: 774
measurements < 10: 389

It is indeed a bit too skewed towards the top range and while it's flattering for us, it perhaps doesn't do a good job at incentivising people to run more measurements.

Moreover, given that the global test list is 1500 sites, I think we want to give a maximum score only to networks which have a number of measurements which is some factors greater than the global list.

I suggest we start off with just tweaking the existing 4 ranges to the ones specified above, which have a pretty reasonable distribution of values and if it still doesn't look good we can switch to more ranges.

In terms of colors I suggest using the palette of blue + grey1.

That is:

measurements > 1000: blue7
measurements 100-1000: blue5
measurements 10-100: blue4
measurements < 10: blue2
measurement = 0: grey1

I checked the above color ranges for contrast and found those distances to be optimal, see chart below: Screenshot 2023-01-12 at 18 16 19

For dates in the future, we should just keep grey1 there as well.

If grey1 turns out to be too light and doesn't have enough contrast with the background, we can potentially use grey2 instead.