ooni / backend

Everything related to OONI backend infrastructure: ooni/api, ooni/pipeline, ooni/sysadmin, collector, bouncers and test-helpers
BSD 3-Clause "New" or "Revised" License
49 stars 29 forks source link

Suggestions to improve the usability of ooni-api by machines, and humans #125

Open hellais opened 7 years ago

hellais commented 7 years ago

@TylerJFisher commented on Tue Dec 01 2015

Hello,

I have been working on a fork of ooni-api on/off, and have noticed a few usability issues with regards to ooni-api, specifically when it comes to which data is exposed to end-users via the API.

As it stands, there doesn't seem to be a clear, or concise way to access data collected by oonib without making (aggressive) assumptions about how ooni-probe reports are structured, specifically when it comes to handling different report types.

A few of the issues that I have noticed are:

In order to handle this, I would like to propose the following amendments to the design of the PostgreSQL DB schema supporting ooni-api to make it easier for third-party developers to work with the metrics we've collected (e.g. journalists, artists, and developers interested in measuring, and analyzing censorship):

By performing the above steps, and by normalizing reporting anomalies (e.g. ooni-probe bridgeT reports not being named bridge_reachability), it would be feasible to construct a star/snowflake schema suitable for performing ad-hoc analytics on ooni-probe test results.


Data currently exposed by ooni-api

ooni-api currently exposes [the following properties for all reports (https://raw.githubusercontent.com/TheTorProject/ooni-spec/master/data-formats/df-000-base.md), with a handful of optional fields which may, or may not be documented in ooni-spec (e.g. all of the optional fields in bridge_reachability ooni-probes)


Where would these enhancements fit in?

Since PostgreSQL provides out-of-the-box support for aggregate queries over JSON fields, I feel like storing both YAML/JSON would be beneficial. The additional storage space would be negligible.


Example YAML report organized into header/entry/footer sections


---
backend_version: 1.1.2
input_hashes: [1d4a3801e94158c0cf0f25605fced50a2f2ab9d538fc9f72c06cfbfa976d1f67]
options: [-f, bridges.txt, -t, '400']
probe_asn: AS29182
probe_cc: BE
probe_city: null
probe_ip: 127.0.0.1
record_type: header
report_filename: 20150101T060023Z-AS29182-bridge_reachability-v1-probe.yaml
report_id: 2015-01-01kfmgdruqhdhmwjyvkgmpeukwyderxomywrinxydg
software_name: ooniprobe
software_version: 1.2.2
start_time: 1420092023.0
test_name: bridge_reachability
test_version: 0.1.2
...

---
backend_version: 1.1.2
bridge_address: null
bridge_hashed_fingerprint: dcdb7afb15187192f4800308db3208055221b86b
distributor: unallocated
error: timeout-reached
input: dcdb7afb15187192f4800308db3208055221b86b
input_hashes: [1d4a3801e94158c0cf0f25605fced50a2f2ab9d538fc9f72c06cfbfa976d1f67]
obfsproxy_log: "2014-12-31 20:43:34,730 [WARNING] Obfsproxy (version: 0.2.12) starting\
  \ up.\n2014-12-31 20:43:34,730 [INFO] Entering client managed-mode.\n2014-12-31\
  \ 20:43:34,731 [ERROR] \n\n################################################\nDo\
  \ NOT rely on ScrambleSuit for strong security!\n################################################\n\
  \n2014-12-31 20:43:34,731 [INFO] Creating directory path `/tmp/tortmpK8lBuE/pt_state/scramblesuit/'.\n\
  2014-12-31 20:43:34,732 [INFO] OBFSSOCKSv5Factory starting on 42239\n2014-12-31\
  \ 20:43:34,732 [INFO] Starting factory <obfsproxy.network.socks.OBFSSOCKSv5Factory\
  \ instance at 0x7fed7e2607e8>\n2014-12-31 20:43:34,732 [INFO] Starting up the event\
  \ loop.\n2014-12-31 20:50:12,674 [INFO] Received SIGTERM, shutting down.\n2014-12-31\
  \ 20:50:12,675 [INFO] (TCP Port 42239 Closed)\n2014-12-31 20:50:12,675 [INFO] Stopping\
  \ factory <obfsproxy.network.socks.OBFSSOCKSv5Factory instance at 0x7fed7e2607e8>\n\
  2014-12-31 20:50:12,675 [INFO] Main loop terminated.\n"
obfsproxy_version: 0.2.12
options: [-f, bridges.txt, -t, '400']
probe_asn: AS29182
probe_cc: BE
probe_city: null
probe_ip: 127.0.0.1
record_type: entry
report_filename: 20150101T060023Z-AS29182-bridge_reachability-v1-probe.yaml
report_id: 2015-01-01kfmgdruqhdhmwjyvkgmpeukwyderxomywrinxydg
software_name: ooniprobe
software_version: 1.2.2
start_time: 1420092023.0
success: false
test_name: bridge_reachability
test_runtime: 305.4590311050415
test_start_time: 1420094612.0
test_version: 0.1.2
timeout: 400
tor_log: 'Dec 31 20:43:32.000 [notice] Tor 0.2.5.8-rc (git-eaa9ca1011e73a9d) opening
  new log file.

  Dec 31 20:43:32.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.

  Dec 31 20:43:32.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.

  Dec 31 20:43:32.000 [warn] You are running Tor as root. You don''t need to, and
  you probably shouldn''t.

  Dec 31 20:43:33.000 [notice] Bootstrapped 0%: Starting

  Dec 31 20:43:33.000 [notice] Delaying directory fetches: No running bridges

  Dec 31 20:43:33.000 [notice] New control connection opened from 127.0.0.1.

  Dec 31 20:43:33.000 [notice] Tor 0.2.5.8-rc (git-eaa9ca1011e73a9d) opening log file.

  Dec 31 20:43:36.000 [notice] Bootstrapped 5%: Connecting to directory server

  Dec 31 20:43:36.000 [notice] Bootstrapped 10%: Finishing handshake with directory
  server

  Dec 31 20:48:35.000 [warn] Problem bootstrapping. Stuck at 10%: Finishing handshake
  with directory server. (DONE; DONE; count 1; recommendation warn)

  Dec 31 20:48:35.000 [warn] 1 connections have failed:

  Dec 31 20:48:35.000 [warn]  1 connections died in state handshaking (TLS) with SSL
  state unknown state in HANDSHAKE

  Dec 31 20:50:12.000 [notice] Catching signal TERM, exiting cleanly.

  '
tor_progress: 10
tor_progress_summary: Finishing handshake with directory server
tor_progress_tag: handshake_dir
tor_version: 0.2.5.8-rc
transport: ss
transport_name: scramblesuit
...

---
backend_version: 1.1.2
input_hashes: [1d4a3801e94158c0cf0f25605fced50a2f2ab9d538fc9f72c06cfbfa976d1f67]
options: [-f, bridges.txt, -t, '400']
probe_asn: AS29182
probe_cc: BE
probe_city: null
probe_ip: 127.0.0.1
record_type: footer
report_filename: 20150101T060023Z-AS29182-bridge_reachability-v1-probe.yaml
report_id: 2015-01-01kfmgdruqhdhmwjyvkgmpeukwyderxomywrinxydg
software_name: ooniprobe
software_version: 1.2.2
stage_1_process_time: 1.8597300052642822
start_time: 1420092023.0
test_name: bridge_reachability
test_version: 0.1.2
...
hellais commented 4 years ago

maybe there is some useful knowledge in here @FedericoCeratto or perhaps we can just close it and move on.

cc @bassosimone @FedericoCeratto