Append the feature to repair the cluster sequentially by tables

cazorla19 commented 6 years ago

There is the issue which happens in typical repair procedure. By default reaper runs keyspace-aware operation which repairs all tables in keyspace simultaneously. Sometimes in load-heavy clusters it brings a behaviour anomaly when all nodes (even in SEQUENTIAL node mode) are fully loaded and, as a result, take too much time to respond.

Of course, reaper API and UI provides an opportunity to pick the desired table, but at this point you have to set up dozens of scheduled repairs with the count equal to the count of your tables in keyspace.

If it's technically possible, it could be implemented by setting the option in configuration file such as tableParallelism

┆Issue is synchronized with this Jira Story by Unito

adejanovski commented 2 years ago

This is a non trivial change in the scheduler and I'm not sure it is worth making it more complex, especially since incremental repair improvements in Cassandra 4.0 should help make the repair load much lighter for each run.

vivek67 commented 2 years ago

@adejanovski: We like to have that feature as well "of scheduling repairs at table level" and willing to collaborate. As @cazorla19 mentioned earlier, we too share the pain-point of managing repair schedules over heavy clusters. PS: Most of our heavy deployments run Cassandra 3.11.

adejanovski commented 2 years ago

@vivek67, I'm still very reluctant to implement this, which I'll try to elaborate on:

Repair sessions are done in two phases: validation (computing the merkle tree to detect differences) and sync (streaming data between replicas). Internally, Cassandra already sequentializes validation tasks per table. By default, only one table will be validation compacted at once (unless you raise the number of job threads, which can go up to 4). This is the heavy part of repair. Once all validation tasks are done, sync tasks will start and they will be concurrent. If the nodes fall behind at this stage, it means you'll have more pending compactions which is used by Reaper to handle backpressure (it will stop scheduling segments until all replicas are under the threshold).

If the repairs are too heavy on your cluster, my recommendation would be to:

create repairs with more segments, which will then be smaller
change the intensity value to lower the load by pausing longer between segments
tune the max number of pending compactions before backpressure applies

Fully sequentializing tables in Reaper itself would, IMO, generate a lot of overhead, on top of being a big change in the code.

wdyt about my suggested tunings?

thelastpickle / cassandra-reaper

Append the feature to repair the cluster sequentially by tables #295