ooni / probe

OONI Probe network measurement tool for detecting internet censorship
https://ooni.org/install
BSD 3-Clause "New" or "Revised" License
758 stars 142 forks source link

Probe failure: no input provided #1328

Open lorenzoPrimi opened 3 years ago

lorenzoPrimi commented 3 years ago

We have a lot of probes which fails the WebConnectivity test due to

org.openobservatory.ooniprobe.common.MKException: {"downloaded_kb":0.0,"failure":"no input provided","idx":0,"key":0.0,"percentage":0.0,"uploaded_kb":0.0}

this might be due to the API returning an empty array or some communication problem between the app and the API. We need to investigate this and is high priority due to the increasing number of bug reports.

Ref: https://countly.ooni.io/dashboard#/5ea6cf6ad7b7950499a822db/crashes/65e8257909c0ec7ed2e41684a044f2cc80d4a1ac

bassosimone commented 3 years ago

We are debugging this issue with @FedericoCeratto. It seems that the root cause is a timeout on the probe side when calling the the API. The mobile app does not handle the timeout as an error. The returned list, on timeout, is obviously empty. The empty list triggers the user-visible error.

We have been able to make excellent progress after this first debugging session! We're confident we're gonna fix this soon!

bassosimone commented 3 years ago

We are going to continue monitoring and testing this fix in Sprint 32.

bassosimone commented 3 years ago

We are inspecting failures with @lorenzoPrimi. This issue is still present. We need to continue investigating with @FedericoCeratto.

hellais commented 3 years ago

Added to the agenda topics for the weekly backend meeting.

bassosimone commented 3 years ago

Added again to the agenda for the backend meeting.

hellais commented 3 years ago

Changes have been made to the backend in order to log the exception of returning and empty test list and added further logging information, see: https://github.com/ooni/api/pull/231.

@lorenzoPrimi should anything else be done on the mobile app front? Have we added logging on the mobile app as well to record why and how it's failing in this way?

lorenzoPrimi commented 3 years ago

We have the logging implemented since months, this is how we discovered this problem

lorenzoPrimi commented 3 years ago

Actually we still have about 50 cases per day https://sentry.io/organizations/ooni/issues/2201734929/?project=5619989&query=is%3Aunresolved

sentry-io[bot] commented 3 years ago

Sentry issue: PROBE-ANDROID-8

lorenzoPrimi commented 3 years ago

The number as as low as 10-20 per day, should we close this? @hellais @FedericoCeratto

hellais commented 2 years ago

I think we should try to better understand why it's happening, because it's still a non-trivial amount of exceptions in the last period. According to sentry it's 454 crashes in the last 14 days.

hellais commented 2 years ago

What we should do is add more logging closer to when the request for obtaining the test list is done: https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/suite/AbstractSuite.java#L140.

If the returned test list is empty we should log it as an error in sentry and attach to the error also the full content of the response from the backend.

lorenzoPrimi commented 2 years ago

in row 140 the test list can never be empty. This is not the url list but the test list (web_connectivity, dash etc)

The url list is downloaded here https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/TestAsyncTask.java#L146

But we already log exceptions and show a modal.

https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/TestAsyncTask.java#L165 https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/TestAsyncTask.java#L141

I'm gonna add one more exception when the list returned is empty

lorenzoPrimi commented 2 years ago

Pretty sure this is not a timeout but rather an empty array returned by the server

bassosimone commented 2 years ago

So, here's what I think may be happening. We have seen in https://github.com/ooni/probe/issues/1922 that there are issues with the HTTP library the app is using around the connection being closed. This should not be treated as an error, but for some reason it's treated as such. What I think may be happening in the case of this bug is that there is a race between the connection being closed and the right callback "data ready" being called. If the connection being closed wins, what happens is that we have an error instead of a successful result, so we get an empty list. I remember we added support for logging most exceptions (to be double checked), but it may be that here we're actually not witnessing an exception but rather an ordinary errback is called instead of a callback. (This is why exceptions are bad and combining exceptions with callbacks for error handling in networking code is not a good idea in the grand scheme of things.)

Regardless of whether my theory above is correct or not, we should get to the bottom of this. One starting point would actually be to refactor the mobile app to always use the engine for making network I/O.

hellais commented 2 years ago

We have added additional logging to the new release of the app. We are going to analyse the logs and iterate further once we have more information.

hellais commented 10 months ago

Has this been fixed? Is this no longer an issue?

jbonisteel commented 2 months ago

@aanorbel do you know if this is still an issue?