Reaper Incremental Repair drops node, full repair identically configured is successful

saklasing commented 2 years ago

We have a one data center, 20 node Cassandra production cluster 2 racks, 10 nodes each.

To date the Cassandra cluster has performed, but has had major health issues due to inability to repair the two largest tables over the last few years . Attempts to nodetool repair ends up dropping one or more nodes, always due to space, usually forcing rebuilds. Smaller keyspace tables repair/compact daily successfully via cron jobs no issues. In the past we kept the cluster alive by rebuilding all 20 nodes one at a time, annually.

Thus Reaper was chosen with the hopes of doing much more smaller grain repairs and to date we have had great success when doing FULL repairs, but very little success with Incremental.

Our typical successful full repair always specifies the problem keyspace, and only the two problem tables, one node, 256 segments per node, parallel process, 4 threads, intensity 0.75, with 30 min segment time outs. Each full repair when submitted results in 396 - 415 jobs. Just to be clear, these run very well and succeed, no issues in less than 24 hours per node.

Now we are trying incremental repairs, with the above identical configuration other than incremental is set to true. This results in only 20 jobs instead of the FULL 396-415 jobs. A one node incremental repair this week ran for 3 days when a full repair on same node would run in 1 day. Reaper never finished the 1st of the 20 jobs before dropping another node due to space issues.

Can you advise how to influence Reaper Repair into breaking the work into smaller pieces? Why would an incremental identically configured to a full take much longer? Why would it break the work into only 20 jobs; note we have 20 nodes?

Main reason for running incrementals is hoping to get it to update its percent repaired statistics as a gage of health. Currently the average across 20 nodes is 24% repaired.

┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1813 ┆priority: Medium

saklasing commented 2 years ago

should have mentioned: Cassandra 3.11.7 Reaper 3.1.1

saklasing commented 2 years ago

this may be moot, I just tried doubling the segments to 512 and the timeout to 90 and noted 1300 plus jobs, I then reset segments to 256 leaving timeout at 90 and it now shows 398 jobs.

saklasing commented 2 years ago

spoke to soon incremental was false, when set to true went back to 20.

Attempting same parameter adjustments again , I still have the issue.

adejanovski commented 2 years ago

Hi @saklasing,

incremental repair cannot be performed on subranges, Cassandra won't allow it due to anticompaction, which is the big difference with full repairs. To be efficient, it has to run on all ranges of a node at once, which is why you cannot tune the number of segments. It will match the number of nodes. Here are two blog post about incremental repair that should give you all the details you need: post 1 and post 2 You'll see that the first post tells you shouldn't be using incremental repair with 3.11 btw ;)

Still, if you want to perform incremental repairs, it works very well on small amounts of data to repair. So the procedure would be the following:

Mark all sstables as repaired using sstablerepairedset
Perform a full repair using Reaper
Start an incremental repair right after the full repair
Schedule daily incremental repairs

The version of Reaper you're running also allows to create schedules that trigger incremental repair based on % of unrepaired data. You can use a value such as 10% and see how often it triggers.

saklasing commented 2 years ago

Many thanks, I did note late yesterday the advise to not use it for incremental repairs. Will follow through with your sstablerepairedset recommendation.

adejanovski commented 1 year ago

Hi @saklasing, any updates on this? Did you go through with incremental repair?

saklasing commented 1 year ago

No we chose to follow your advice, not to do incremental repair, we ended up using reaper to do full repairs for the problem tables, and subsequently also compacted them. Combining repairs/compactions with actual node rebuilds we have lowered the average cassandra node of 1.5tb data down to 200-400GB range thus showing how unhealthy the cluster was.

Thank you for the follow up.

-- Scott Klasing Lead Database Administrator Smule, Inc. e | @.*** c | 239.340.3392

On Tue, Dec 13, 2022 at 1:22 AM Alexander Dejanovski < @.***> wrote:

Hi @saklasing https://github.com/saklasing, any updates on this? Did you go through with incremental repair?

— Reply to this email directly, view it on GitHub https://github.com/thelastpickle/cassandra-reaper/issues/1227#issuecomment-1348024430, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3GUC23ZYUFJSZVEC2XSIY3WNA57BANCNFSM6AAAAAAQZEMPLE . You are receiving this because you were mentioned.Message ID: @.***>

-- Scott Klasing Lead Database Administrator Smule, Inc. e | @.*** c | 239.340.3392

adejanovski commented 1 year ago

Thanks for the update. Closing the ticket.

thelastpickle / cassandra-reaper

Reaper Incremental Repair drops node, full repair identically configured is successful #1227