Closed hellais closed 5 months ago
Attention: Patch coverage is 94.72296%
with 20 lines
in your changes missing coverage. Please review.
Project coverage is 84.06%. Comparing base (
74a9b76
) to head (730b8f5
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This is an important refactor of the table models.
It moves the ProbeMeta and MeasurementMeta into nested composed classes, which is nicer because you don't get lost in the complicated class inheritance, but most importantly it significantly boosts performance because we don't have to make copies of each MeasurementMeta to pass it around.
I also introduced to better patterns for handling the TableModels. Basically you decorate a table that should end up inside of the database via the
table_model
decorator and then when it's used type safety is enforced by theTableModelProtocol
.Thanks to this refactoring it's also possible to improve the way in which we handle both the buffering and serialization of writes, but also the creation of the
CREATE
table queries by using python type hints.Some of these features require recentish versions of python (i.e. >=3.10), however we have already decided that backward compatibility is not a priority for the pipeline.
We might however need some kind of compatibility layer if some of these functions need to be used by oonidata (though we might also drop older python support there too at some point if it gets too complex to manage).
There are still several parts which need to be refactored, but I suggest doing that later and they are marked as TODO(art).
This also adds support for making use of buffer tables, which has a significant performance boost in a parallalized workflow avoiding the issue outlined in here: https://github.com/ooni/data/issues/68
Moreover, we come up with better pattern to wait for table buffers being flushed before starting the dependant workflow. this can be implemented using primitives of temporal.
We also enrich columns with the new processing time metadata for performance monitoring.