ooni / probe-engine

Semi-automatic export of https://github.com/ooni/probe-cli internals
https://ooni.org
GNU General Public License v3.0
45 stars 16 forks source link

Make OONI Probe testing more resilient to network outages #88

Closed bassosimone closed 4 years ago

bassosimone commented 4 years ago

I have implemented this functionality in https://github.com/ooni/netx, a more low-level library where this functionality was better placed. I have uses https://github.com/ooni/jafar to calibrate timeouts and retransmissions, and have concluded the following (see also https://github.com/ooni/netx/commit/9a94f46faf2b19b5cfd7133d7b9ca4274799f579)

  1. we should configure conservative timeouts (e.g. 30s for connect)

  2. we should use Go deadlines to terminate operations that are running for too long time

This is better than using just short timeouts to guarantee that the whole duration of an operation is bounded, because it gives a bad network time to breathe in case of interference. More on the field testing is probably required to validate this assumption.

Another aspect related to this issue is the implementation of automatic DNS fallback. It was devised for #87 but it is actually also useful here. In more than one instance, I have been in networks in which measurements needed to be run, where the DNS was not well provisioned, therefore MK failed quite often. The automatic fallback to other DNS servers using DoT and DoH, and the fact that, for several of them we use hardcoded IP addresses, implies that, if the DNS does not always reply, then we'll have a fallback and we'll still be able to use the network.