ooni / backend

Everything related to OONI backend infrastructure: ooni/api, ooni/pipeline, ooni/sysadmin, collector, bouncers and test-helpers
BSD 3-Clause "New" or "Revised" License
50 stars 29 forks source link

Address false negatives in Web Connectivity measurements #453

Open agrabeli opened 4 years ago

agrabeli commented 4 years ago

A data analyst at the Citizen Lab recently detected false negatives in OONI Explorer measurements.

Below are examples of such measurements (which are annotated as "accessible", even though the raw data displays blocking):

https://explorer.ooni.org/measurement/20191229T003732Z_AS18004_JI2BzWJrjBZuKZN32BaJHNsOSGjE6PPvdisr4Hasr1QS1Jnw0G?input=http://www.samesexmarriage.ca/

https://explorer.ooni.org/measurement/20200213T053902Z_AS17974_igK3SmCYSdqnVjV2OxZ1VCUnsaECgo8Dh7S6CfU0AQgWg5nPQU?input=http://www.queernet.org/

According to this data analyst, they have detected such false negatives across many measurements in the OONI dataset.

This leads me to think that there might be a bug in our analysis logic (in which case, it could potentially impact a large bulk of measurements), and/or that this may potentially be caused by the different analysis logic between OONI Explorer and the pipeline.

If this is the case, perhaps we need to fix the bug in the analysis logic and re-process all measurements?

What do you think?

Based on an internal discussion with @sarathms , it seems that this may be caused by a bug in the API analysis logic.

bassosimone commented 4 years ago

There are potentially multiple issues here. Some of these lie probably in the reprocessing function of the pipeline, which may be improved to correctly flag block pages regardless of the status of the control measurement. There may also be other issues down the line. There also seem to be probe side issues here. The former measurement you shared is analysed as being okay by the probe and the specific answer is something like "cleartext okay". So, to make this issue really actionable, we will probably need to investigate. Having many offending URLs would probably be helpful.

jakubd commented 4 years ago

Here is a list of 500 Explorer URLs that I tagged as blocked but are mostly Green OK in explorer. I can get you a more complete list off my DB dump (around 34k) but this should be more manageable to get a grip on the prob.

ooni-oops-ok.txt

bassosimone commented 4 years ago

@jakubd yes, that would be very useful to start off, thank you!