ooni / data

OONI Data CLI and Pipeline v5
https://docs.ooni.org/data
8 stars 4 forks source link

Consider switching to async inserts or batch tables #68

Closed hellais closed 2 weeks ago

hellais commented 4 months ago

At the moment if you run too many workers on machine that is too fast, you can run into issues related to performing too many inserts per second even with the current approach of batching inserts inside of the custom ClickhouseConnection we use ooni/pipeline: https://github.com/ooni/data/blob/main/oonipipeline/src/oonipipeline/db/connections.py#L34.

We should consider switching to some of the native methods of either using the BufferTable engine or async inserts.

For the daily processing it's not so much of a concern, however it's a bit more of an issue for backfilling.

hellais commented 2 weeks ago

This was done in here: https://github.com/ooni/data/commit/c797c2698300826d5af406546a878aee93671979