ooni / probe-legacy

The legacy python version of OONI Probe
Other
22 stars 12 forks source link

Fix openvpn test when it triggers a restart #67

Open hellais opened 8 years ago

hellais commented 8 years ago

When openvpn encounters an error it will send itself a signal and restart hence disconnecting stdin/stdout and stdout that is interpreted by the openvpn test as the process having terminated.

This should be better handled and when this happens we should be sure to reconnect to the new stdin/stdout of the process.

The relevant piece of code that needs to be changed in the openvpn test code is this: https://github.com/TheTorProject/ooni-probe/blob/master/ooni/nettests/third_party/openvpn.py#L61

It's probably at this point also worthwhile to investigate doing a better job at mapping the possible failure modes of openvpn itself.

Places to look for interesting errors are: https://github.com/OpenVPN/openvpn/blob/e7ec6a3a11ecee54cb10de789668dd37c3f9fc54/src/openvpn/ssl.c#L2261

Here is how the logline looks like for openvpn encountering a TLS error and restarting by sending a SIGUSR1 signal.

Wed Jun  1 XX:XX:XX 2016 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Wed Jun  1 XX:XX:XX 2016 TLS Error: TLS handshake failed
Wed Jun  1 XX:XX:XX 2016 SIGUSR1[soft,tls-error] received, process restarting

cc @mrphs

TheNavigat commented 7 years ago

That doesn't sound like the correct approach.

AFAIK, OpenVPN itself would keep retrying to connect indefinitely even if the problem can never be solved (no internet connection, for example). By reconnecting to the restarted process and waiting, we'd be waiting forever.

The intuition is that when OpenVPN fails, it fails because of a problem. That problem shouldn't be ignored; ooni-probe should fail too.

There should probably be better handling of errors indeed, but reconnecting to the restarted process in case of a timeout (or other errors that cause OpenVPN to keep retrying indefinitely) equals a hog.

Of course, feel free to correct me if I'm wrong. I probably am :)

hellais commented 4 years ago

This is related to a probe-legacy test. I am moving this issue into there for archival.