m-lab / murakami

Run automated internet measurement tests in a Docker container.
Apache License 2.0
41 stars 11 forks source link

Add new runner for ooniprobe-cli #103

Closed robertodauria closed 2 years ago

robertodauria commented 2 years ago

This PR adds ooniprobe-cli to the Docker containers and a new runner which executes ooniprobe run unattended, collects the results and wraps them into a Murakami JSON result.


This change is Reviewable

bassosimone commented 2 years ago

(Ah, while there, I see that tests are failing, but it's unclear to me whether this is related to changes in this PR.)

critzo commented 2 years ago

Thank you for writing this pull request! hugs

The following list summarises the main blockers that I think we should figure out before proceeding with merging this code:

1. the logic for processing results needs to be improved to take into account the `fields.type`, which is the most robust way to perform this kind of parsing (the current code will insert at least a spurious JSON into the measurement, caused by the final summary β€” it's not the end of the world but I'd rather avoid inserting this spurious measurement nonetheless).

Since unattended produces several test results, should each be saved separately @bassosimone ? In the context of Murakami, it seems more appropriate to save each one separately, though they are conducted at the same time. I also agree that it's better to not insert spurious JSON from the final summary into the result file. We want each JSONL result from a Murakami test runner to just contain the measurement and Murakami metadata.

2. I'm not sure whether accepting the onboarding without giving the user a chance to review the onboarding message is ethical, and I'd like to discuss this further with my team. In the meanwhile, it may be useful for me to know whether Murakami comes with predefined defaults and how a user is supposed to change such defaults. (Sorry if this is a noob question, please feel free to point me to the reference manual directly.)

One way to think about this choice is that by running a Murakami instance and enabling the ooni tests, a user is consenting. They do that by setting the option in an environment variable or configuration file. The onboarding is interactive, so avoiding it for automated Murakami tests is preferable.

3. I think it would be beneficial for me to read more about how Murakami automatically updates to have some mental model of when/how the ooniprobe shipped with Murakami will be updated. (This is not a blocker in itself for merging, but I think it would be useful for me, and my team more in general, to know what to expect from Murakami installations in terms of automatically updating.)

Currently I think we tag to a specific release. Making that 'latest' would ensure new installations of Murakami include the most recent ooniprobe version. To update existing Murakami installs, we've logged an open issue, which basically involves building a new image and re-deploying to any installs. We plan to address this in documentation I think.

Thanks again! slightly_smiling_face

robertodauria commented 2 years ago

Thanks for reviewing!

  1. the logic for processing results needs to be improved to take into account the fields.type, which is the most robust way to perform this kind of parsing (the current code will insert at least a spurious JSON into the measurement, caused by the final summary β€” it's not the end of the world but I'd rather avoid inserting this spurious measurement nonetheless).

That has been changed. It now checks for both fields.type and the existence of "id", since the next step couldn't work otherwise.

  1. I'm not sure whether accepting the onboarding without giving the user a chance to review the onboarding message is ethical, and I'd like to discuss this further with my team. In the meanwhile, it may be useful for me to know whether Murakami comes with predefined defaults and how a user is supposed to change such defaults. (Sorry if this is a noob question, please feel free to point me to the reference manual directly.)

Two changes have been made to address this:

  1. By default, now all runners are disabled unless explicitly configured by the user
  2. The example config file, murakami.toml.example, now contains the same wording used by ooniprobe's onboarding to warn the user, plus a link to the documentation on OONI's website. The user needs to explicitly enable the runner in their configuration file.
  1. I think it would be beneficial for me to read more about how Murakami automatically updates to have some mental model of when/how the ooniprobe shipped with Murakami will be updated. (This is not a blocker in itself for merging, but I think it would be useful for me, and my team more in general, to know what to expect from Murakami installations in terms of automatically updating.)

Murakami doesn't auto-update in any way. Users will usually run it via a Docker container, and the only tag available at the moment is latest, so simply restarting Murakami would give them the most recent version (assuming Docker's cache has expired). We plan on adding tagged releases and Docker images in future.

Also, Murakami now generates 4 files (one per test run via ooniprobe run unattended) which are exported separately. E.g.

2021-11-11 18:44:16,563 local.py:38 INFO Copying data to /tmp/ooniprobe-0-2021-11-11T17:33:24.037067.jsonl
2021-11-11 18:44:16,563 local.py:38 INFO Copying data to /tmp/ooniprobe-1-2021-11-11T17:33:24.037067.jsonl
2021-11-11 18:44:16,563 local.py:38 INFO Copying data to /tmp/ooniprobe-2-2021-11-11T17:33:24.037067.jsonl
2021-11-11 18:44:16,563 local.py:38 INFO Copying data to /tmp/ooniprobe-3-2021-11-11T17:33:24.037067.jsonl

The TestName in each of these files is ooniprobe-$TESTNAME (ooniprobe-im, etc.)

bassosimone commented 2 years ago

@critzo wrote:

Since unattended produces several test results, should each be saved separately @bassosimone ? In the context of Murakami, it seems more appropriate to save each one separately, though they are conducted at the same time. I also agree that it's better to not insert spurious JSON from the final summary into the result file. We want each JSONL result from a Murakami test runner to just contain the measurement and Murakami metadata.

I agree!

One way to think about this choice is that by running a Murakami instance and enabling the ooni tests, a user is consenting. They do that by setting the option in an environment variable or configuration file. The onboarding is interactive, so avoiding it for automated Murakami tests is preferable.

Yes; I think having the test disabled with wording explaining the risk, like @robertodauria did, is πŸ” !

Currently I think we tag to a specific release. Making that 'latest' would ensure new installations of Murakami include the most recent ooniprobe version. To update existing Murakami installs, we've logged an open issue, which basically involves building a new image and re-deploying to any installs. We plan to address this in documentation I think.

Thanks for explaining!