ooni / probe

OONI Probe network measurement tool for detecting internet censorship
https://ooni.org/install
BSD 3-Clause "New" or "Revised" License
754 stars 142 forks source link

cli: reduce the complexity of database interactions #2024

Open hellais opened 2 years ago

hellais commented 2 years ago

I have been doing a bit of thinking of how the overall complexity of the CLI app can be reduced and discussed some of these ideas with @bassosimone.

tl;dr I propose we do the following incremental simplifications to probe-cli (not necessarily in this order and read more for details):

How I see it: one of the major pain points related to the probe-cli functionality is the interaction with the database and the need to keep track of state during a test run. (see: https://github.com/ooni/probe-cli/blob/master/cmd/ooniprobe/internal/nettests/nettests.go#L125).

Moreover the current database model is quite rigid in that it makes the assumption that an input is always going to be a URL, which has impacted our ability to adapt the run logic to other non-web_connectivity tests that also require an input.

I think that a lot of this stems from the fact that the database model was prematurely optimised by taking into account only the requirements of the websites test type, but we didn't consider how it would evolve over time.

We currently have 4 database tables used to store measurement result metadata: results, measurements, urls and networks: https://github.com/ooni/probe-cli/blob/master/cmd/ooniprobe/internal/database/migrations/1_create_msmt_results.sql.

The relationship between the tables is as follows:

results <-1 to many-> measurements
results <-1 to 1-> networks
measurements <-1 to 1-> urls

These dependencies between tables, creates some unnecessary complexity in that we are required to create and maintain references to some of them across the whole testing lifecycle. see: https://github.com/ooni/probe-cli/blob/master/cmd/ooniprobe/internal/nettests/nettests.go#L92 & https://github.com/ooni/probe-cli/blob/master/cmd/ooniprobe/internal/nettests/nettests.go#L190.

It also means that we need to do more complex cross table joins when retrieving the results.

My proposal is to simplify this by dropping the networks and urls tables, which when we need to list measurements means we have to do cross table joins.

The tradeoff is that of increasing the overall size of the DB, but it's going to simplify the access patterns to it, so it's probably going to be a win.

Another more drastic thing we could eventually do, is that of dropping the results table as well, and consolidate everything inside of a single measurements table. We can then just generate a result_id (which maybe can actually be generated by the engine itself) and then we can do queries by means of GROUP BY result_id to access measurements on a per result basis.

This would also get us closer to a model where we can easily swap out our relational database in favour of a more simple key value store.

Another thing that we could do to further simplify the database interactions is that we could get rid of the is_done state for a measurement. Currently we create a measurement entry in the database when the experiment starts and update the state of it 3 times: when it's uploaded, when it's finished and when the test keys are generated. This significantly complicates the state machine, for something which we don't really care so much about (why would a user care to know of an interrupted experiment?). In doing so we could probably delegate much more to the engine and limit the database interaction to a single write at the end of the test workflow.

hellais commented 2 years ago

With @aanorbel we also looked into how this can simplify the logic in the mobile apps and a similar approach can be taken there too.

One of the requirements is that the engine should provide, when the measurement event is emitted all the necessary information to create a Measurement DB entry. This include the input and the related metadata.

In doing so, we might also want to delegate all the input retrieval logic to the engine.

The place in the mobile app where this happens is here: https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/test/AbstractTest.java#L140