Closed hellais closed 3 months ago
At the moment if you run too many workers on machine that is too fast, you can run into issues related to performing too many inserts per second even with the current approach of batching inserts inside of the custom ClickhouseConnection we use ooni/pipeline: https://github.com/ooni/data/blob/main/oonipipeline/src/oonipipeline/db/connections.py#L34.
We should consider switching to some of the native methods of either using the BufferTable engine or async inserts.
For the daily processing it's not so much of a concern, however it's a bit more of an issue for backfilling.
This was done in here: https://github.com/ooni/data/commit/c797c2698300826d5af406546a878aee93671979
At the moment if you run too many workers on machine that is too fast, you can run into issues related to performing too many inserts per second even with the current approach of batching inserts inside of the custom ClickhouseConnection we use ooni/pipeline: https://github.com/ooni/data/blob/main/oonipipeline/src/oonipipeline/db/connections.py#L34.
We should consider switching to some of the native methods of either using the BufferTable engine or async inserts.
For the daily processing it's not so much of a concern, however it's a bit more of an issue for backfilling.