ooni / data

OONI Data CLI and Pipeline v5
https://docs.ooni.org/data
8 stars 4 forks source link

Optimize performance of table writers and refactor table model #74

Closed hellais closed 2 months ago

hellais commented 4 months ago

This is an important refactor of the table models.

It moves the ProbeMeta and MeasurementMeta into nested composed classes, which is nicer because you don't get lost in the complicated class inheritance, but most importantly it significantly boosts performance because we don't have to make copies of each MeasurementMeta to pass it around.

I also introduced to better patterns for handling the TableModels. Basically you decorate a table that should end up inside of the database via the table_model decorator and then when it's used type safety is enforced by the TableModelProtocol.

Thanks to this refactoring it's also possible to improve the way in which we handle both the buffering and serialization of writes, but also the creation of the CREATE table queries by using python type hints.

Some of these features require recentish versions of python (i.e. >=3.10), however we have already decided that backward compatibility is not a priority for the pipeline.

We might however need some kind of compatibility layer if some of these functions need to be used by oonidata (though we might also drop older python support there too at some point if it gets too complex to manage).

There are still several parts which need to be refactored, but I suggest doing that later and they are marked as TODO(art).

This also adds support for making use of buffer tables, which has a significant performance boost in a parallalized workflow avoiding the issue outlined in here: https://github.com/ooni/data/issues/68

Moreover, we come up with better pattern to wait for table buffers being flushed before starting the dependant workflow. this can be implemented using primitives of temporal.

We also enrich columns with the new processing time metadata for performance monitoring.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 94.72296% with 20 lines in your changes missing coverage. Please review.

Project coverage is 84.06%. Comparing base (74a9b76) to head (730b8f5).

Files Patch % Lines
oonidata/src/oonidata/models/base.py 80.76% 5 Missing :warning:
oonipipeline/src/oonipipeline/db/create_tables.py 93.93% 4 Missing :warning:
...pipeline/src/oonipipeline/analysis/web_analysis.py 77.77% 2 Missing :warning:
oonipipeline/src/oonipipeline/cli/commands.py 85.71% 2 Missing :warning:
...onipipeline/analysis/website_experiment_results.py 66.66% 1 Missing :warning:
oonipipeline/src/oonipipeline/db/connections.py 98.41% 1 Missing :warning:
...c/oonipipeline/temporal/activities/observations.py 85.71% 1 Missing :warning:
...oonipipeline/transforms/measurement_transformer.py 93.75% 1 Missing :warning:
...nipipeline/transforms/nettests/web_connectivity.py 92.30% 1 Missing :warning:
...peline/src/oonipipeline/transforms/observations.py 85.71% 1 Missing :warning:
... and 1 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #74 +/- ## ========================================== + Coverage 83.62% 84.06% +0.43% ========================================== Files 74 74 Lines 5893 6086 +193 ========================================== + Hits 4928 5116 +188 - Misses 965 970 +5 ``` | [Flag](https://app.codecov.io/gh/ooni/data/pull/74/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ooni) | Coverage Δ | | |---|---|---| | [oonidata](https://app.codecov.io/gh/ooni/data/pull/74/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ooni) | `77.41% <90.00%> (-0.03%)` | :arrow_down: | | [oonipipeline](https://app.codecov.io/gh/ooni/data/pull/74/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ooni) | `87.10% <95.44%> (+0.55%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ooni#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.