ooni / probe

OONI Probe network measurement tool for detecting internet censorship
https://ooni.org/install
BSD 3-Clause "New" or "Revised" License
759 stars 142 forks source link

Wrong ASN displayed in successful results of a re-run failed test when different network is selected #851

Closed elioqoshi closed 2 years ago

elioqoshi commented 5 years ago

Scenario (how to reproduce):

  1. I'm on mobile data and run the Performance test class. NDT results come out but the DASH Test has failed. If I tap on the DASH test row it asks me whether I'd like to re-run the test.
  2. For whatever reason, I switch to WiFi and disable my mobile data and decide to re-run that DASH test. This time, it succeeds and I get the results of the DASH test, however the ASN information of both the NDT and DASH test stay as the original Mobile Data ASN, although this is not correct (the DASH test only succeeded on my WiFi, hence the WiFi ISP ASN should be displayed for the DASH test results.

This is the case also in the raw data (the ASN information of the original failed test is included there, instead of the WiFi ASN where it actually succeeded. Even the network type is still "mobile" instead of "WiFi".

Edit: It seems that the actual JSON file is correct and gets updated with the new ASN information. My mistake. I must have confused something

Proposal:

As in the Test Summary screen tests have the same network (as the are originally run at the same time), re-running a failed test invalids this logic as the failed test is shown in the same summary, although itself it can have been run on a different network. Therefore, it would be best to create a new test summary for any failed tests which have been re-run and succeeded, showing the actual correct ASN information as well.

P.S: I suspect similar behaviour applies also to other Test classes, so this is most likely not exclusive to the Performance test class. This was tested on OONI Probe Android 9 (One Plus 6) but it doesn't seem to be an Android exclusive bug.

bassosimone commented 5 years ago

This is fundamental. MK has an atomic concept of tests where it discovers ASN, etc., every time a test is run, so I would expect the data to contain the correct ASN (can you check?). OTOH, the mobile application has the concept of a group of tests. Also, it was never designed with in mind the case that you could re-run a single test of a group of tests from a different network.

However, I am not sure about your suggested fix. Sadly, tests in the app are groups not individual tests, so, in your example, what would you display as the result of NDT? A bunch of N/As?

I do not believe I am the right person to assign this issue to, by the way. Please, make sure that the data collected by the test is correct. If that's the case, this is not an MK issue but rather an issue with the design of the app, for which I honestly don't know what to say.

elioqoshi commented 5 years ago

MK has an atomic concept of tests where it discovers ASN, etc., every time a test is run, so I would expect the data to contain the correct ASN (can you check?

It does not. Neither in the Summary of that test group, neither in the raw data of that individual test (the ASN of the original failed test is shown)

Edit: It seems that the actual JSON file is correct and gets updated with the new ASN information. My mistake. I must have confused something

Also, it was never designed with in mind the case that you could re-run a single test of a group of tests from a different network.

Yes, that's a conceptual problem indeed. We could have 2 options then:

  1. Re-run a single test (which failed before) from a test group and create a new test group which includes only the new successful results. The individual test from the old test group will be deleted. This approach basically splits the re-run test into a new test group (what I proposed in the original comment).
  2. Re-run the whole test group even if one or more of its individual tests have not failed. The old test group will be deleted (or not?) and a new test group with the test results from the re-run test group is added.

However, I am not sure about your suggested fix. Sadly, tests in the app are groups not individual tests, so, in your example, what would you display as the result of NDT? A bunch of N/As?

The app is designed to allow also individual tests. Any user could go to the Test Overview screen and configure to test only for NDT for example. This is a completely foreseen scenario without any surprises. The app shows this then in that case:

image

(related to this, it seems there is a visual bug which makes these labels appear empty, I opened an issue about it here: https://github.com/ooni/probe/issues/852 . This does not apply though to Instant Messaging or Middleboxes tests which work fine and the user can select which types of tests they want to run there)

I do not believe I am the right person to assign this issue to, by the way. Please, make sure that the data collected by the test is correct. If that's the case, this is not an MK issue but rather an issue with the design of the app, for which I honestly don't know what to say.

The DASH test always fails on my mobile network for some reason and I can reproduce this bug endlessly with the same results. I'm sure the data is correct. If it's not an MK issue then that's also good to know and we can get closer to the problem. @lorenzoPrimi can you investigate this?

bassosimone commented 5 years ago

Neither in the Summary of that test group, neither in the raw data of that individual test (the ASN of the original failed test is shown)

Please, paste here the logs that you see.

The app is designed to allow also individual tests.

Then it's a no-brainer. Leave the old test failed. Run the new test a group-test with only such test inside the group. It may entail some toil, I hope not that much, but we should be fine (TM).

The DASH test always fails on my mobile network for some reason

You most likely have Vodafone. DASH does not cope well with middleboxes. (But, again, paste the logs such that my guesses can be sustantiated with evidence, or contradicted.)

[Reassigning myself until it's clear that I can safely unassign myself or there's a MK issue.]

lorenzoPrimi commented 5 years ago

So far I see three possibilities: 1) re-run test creates a new entry if the network is different (can be challenging but can be doable) 2) store network info inside measurement object (yeah easy, but the issue on what to display in the header of test suite results remain if there are two different asn for two measurements) 3) re-run test always create a new result and measurement object (a bit ugly)

elioqoshi commented 5 years ago

Here are the logs of the successful re-run DASH test:

Contacting bouncer: https://bouncer.ooni.io
Using discovered collector: https://c.collector.ooni.io:443
Your country: AL
Your ISP identifier: AS21183
Your ISP name: ABCOM Shpk
Discovering mlab server using mlabns
Discovered mlab test server: neubot.mlab.mlab3.tgd01.measurement-lab.org
Negotiating with: neubot.mlab.mlab3.tgd01.measurement-lab.org
Start dash test with: http://neubot.mlab.mlab3.tgd01.measurement-lab.org/
Connected to server (3WHS RTT = 0.013696 s); starting the test
Test complete; closing connection
Collecting results

Here is the log of the first NDT test which never failed (part of the test group where the DASH failed). I'm unable to get the log of the failed DASH test as it's not accessible (Note, I reduced the speed to 2G as the test was using a lot of bandwidth on 4G):

Contacting bouncer: https://bouncer.ooni.io
Using discovered collector: https://c.collector.ooni.io:443
Your country: AL
Your ISP identifier: AS30722
Your ISP name: Vodafone Italia S.p.A.
Discovered mlab test server: ndt.iupui.mlab3.tgd01.measurement-lab.org
Server version: v4.0.0.1-Web100
Connected to ndt.iupui.mlab3.tgd01.measurement-lab.org:42834
Starting upload
Elapsed enough time
Ending upload (0)
Connection to ndt.iupui.mlab3.tgd01.measurement-lab.org:42834 closed
C2S speed calculated by server: 120 kbit/s

Then it's a no-brainer. Leave the old test failed. Run the new test a group-test with only such test inside the group. It may entail some toil, I hope not that much, but we should be fine (TM).

Yep, I think that's the more elegant solution too.

You most likely have Vodafone.

That's indeed correct.

bassosimone commented 5 years ago

@elioqoshi:

Here are the logs

It seems to me the two ASNs differ. So MK is doing the right thing. Am I missing something?

You most likely have Vodafone.

That's indeed correct.

Can you show also me the logs of the failed DASH for confirmation?

(Also, thanks!)

elioqoshi commented 5 years ago

So far I see three possibilities:

I think all of the 3 possibilities you mention @lorenzoPrimi are sensible.

I might even prefer:

  1. re-run test always create a new result and measurement object (a bit ugly)

Because if a failed test is part of a test group done on 26 February and I re-run that test on 28 February, why would that successful test appear in the original cell with the date 26 February if it was run on 28 February? It might be more truthful to create a new cell for every re-run test.

It seems to me the two ASNs differ. So MK is doing the right thing. Am I missing something?

@bassosimone sorry, I just edited my first comment and forgot to mention it. The actual raw data is fine (I could swear when I first saw the JSON the networks were the same, but I might have confused something). JSON output is fine indeed and not something which should concern you it seems.

Can you show also me the logs of the failed DASH for confirmation?

Sorry there is no way how to. Only successful tests have an options to access logs. I opened an issue exactly for this limitation: https://github.com/ooni/probe/issues/850

bassosimone commented 5 years ago

Because if a failed test is part of a test group done on 26 February and I re-run that test on 28 February, why would that successful test appear in the original cell with the date 26 February if it was run on 28 February? It might be more truthful to create a new cell for every re-run test.

Totally.

@bassosimone sorry, I just edited my first comment and forgot to mention it. The actual raw data is fine

OK, then I'll remove myself and continue lurking.

Sorry there is no way how to.

This is a bit of a bummer. We should probably fix it.

hellais commented 5 years ago

I understand what the problem is, I think the network and ASN model we designed did not take into account how this would play with the test re-run model.

While placing a re-run test inside of a new test group may work well for performance or other more verbose tests, I fear it will not work as nicely for re-running websites tests.

Another option that we could go for would be that of treating re-run tests differently and every time you re-run a test a new network row is created and the linking between network object is done at the measurement level instead of the result level.

Effectively this would mean that you would see in the result listing a given ASN, but then you would see a different AS in the details view of the re-run test.

elioqoshi commented 5 years ago

Another option that we could go for would be that of treating re-run tests differently and every time you re-run a test a new network row is created and the linking between network object is done at the measurement level instead of the result level. Effectively this would mean that you would see in the result listing a given ASN, but then you would see a different AS in the details view of the re-run test.

I'm not sure if I understand this right. with "Result Level" you mean the Test Summary result view, right? What you say could work, but I'd add to change following in the Test Summary view If various networks are used in the same test group: Networks: Various

This way we do not show anything incorrect at the Test Summary view level.

hellais commented 5 years ago

Thinking again about this, I think that re-running a test inside of a different network is an edge case and is something that I don't think should be treated as a first class citizen.

By rerunning a particular test inside of a different network than the original test, they are effectively not re-running the same experiment, because the network has changed.

I suggest we either:

  1. Disallow re-running a test when they are inside of a different network
  2. If they re run it inside of a different network it ends up inside of a new test result

I guess this would require doing the geo IP lookup separately from MK and using that to determine if we should be creating a new result set of if the result should end up inside of the same test group.

In general I think we need to think a bit more about all the possible edge cases that occur in test-rerunning because some odd things can happen (for example if a user re-runs a test after many days of the original, should the date in the result set be updated to the latest timestamp or should it keep the original run date?).

elioqoshi commented 5 years ago
  1. If they re run it inside of a different network it ends up inside of a new test result

I'm a fan of this approach. Would the old failed test row in the summary be deleted if it's re-run test has been successful?

(for example if a user re-runs a test after many days of the original, should the date in the result set be updated to the latest timestamp or should it keep the original run date?).

We are talking about re-running a failed test many days after the original, right? I think that ANY re-run should branch off into a new test group row with its actual timestamp. Keeping the original timestamp in a re-run test is simply not factually correct afterall, no?

hellais commented 5 years ago

Note in my proposal this should happen only for re-running of tests when the network has changed (which as mentioned is an edge case).

In all other cases it would end up inside of the same result set.

The reason for this is that in the use case I believe we want to be encouraging users to follow, is that of a single failing test and the user wants to retry a failing test on the same network they are on (which was probably a transient network failure).

If we put all failing re-run tests inside of their own result set, there is the concern of it cluttering the result listing with a bunch of single entry results.

elioqoshi commented 5 years ago

That would not solve the timestamp problem then though. Don't you think it's problematic having a re-run test done successfully today inside a test group row of a test from 3 weeks ago with a timestamp of 3 weeks ago?

hellais commented 5 years ago

Don't you think it's problematic having a re-run test done successfully today inside a test group row of a test from 3 weeks ago with a timestamp of 3 weeks ago

I don't think it's as bad as having all re-run tests end up in a new result set.

A result set has the timestamp of when the result set was first created, while the timestamps in measurements themselves would be consistent with the time at which they were run.

If a user wants to run a particular test at such a later point in time, why would they do it via the re-run function of a failed test and not by initiating a new test?

I believe this is also an edge case, which is handled adequately by the current model.

lorenzoPrimi commented 5 years ago
  1. If they re run it inside of a different network it ends up inside of a new test result

So it's my solution number 1, right?

I'm a fan of this approach. Would the old failed test row in the summary be deleted if it's re-run test has been successful?

Nothing gets deleted, only it's set rerun = 1 so invisible to users

elioqoshi commented 5 years ago

I don't think it's as bad as having all re-run tests end up in a new result set.

I suppose you are referring to Web Connectivity tests as these do have the most results. For that, I agree. For all other 3 Test Classes though, it makes sense to create a new result set. It would be not a wise decision to deal with these 2 cases differently though.

A result set has the timestamp of when the result set was first created, while the timestamps in measurements themselves would be consistent with the time at which they were run.

What we could do is maybe change the Time & Date label to something like: First Run when this edge case = true ? I understand where you are coming from though.