Closed kzemek closed 3 years ago
Hi @kzemek, which storage backend are you using? I'm assuming you don't have any segment running at all when clicking on the "View Segments" button on running repairs?
I'm using Cassandra backend, with settings:
REAPER_AUTO_SCHEDULING_ENABLED: "true"
REAPER_DATACENTER_AVAILABILITY: SIDECAR
REAPER_REPAIR_RUN_THREAD_COUNT: "4"
REAPER_REPAIR_THREAD_COUNT: "4"
REAPER_REPAIR_MANAGER_SCHEDULING_INTERVAL_SECONDS: "5"
REAPER_MAX_PARALLEL_REPAIRS: "4"
I changed the last two settings after reinstalling reaper, i.e. I had this issue with default 30 interval seconds and 2 parallel repairs. Also, the repair is DATACENTER_AWARE.
I'm assuming you don't have any segment running at all when clicking on the "View Segments" button on running repairs?
I do have one segment running, started 25 minutes ago
I have paused the repair and then removed, cleared, and readded one of the Cassandra nodes that took way longer than others to repair segments. Now I indeed have no segments running, and my current progress is 23/3151
Well, if a segment has been running for 25 minutes, then "something" was going on at least 🙂
How many nodes are there in the cluster and what's the replication factor of the repaired keyspace? Your thread count is fairly small, you should definitely put at least 15. These are threads that will be used by both the repair runners and the segment runners (so with 4, if you have a single repair running then you'll have at most 3 segments running).
What do you mean by "removed, cleared, and readded one of the Cassandra nodes" ? Did you remove it from the ring and bootstrap it from scratch or did you replace it by itself in order to keep the token ownership unchanged?
Well, if a segment has been running for 25 minutes, then "something" was going on at least 🙂
Agreed, this might have been more on Cassandra side - the other repairs were taking seconds, up to a minute (driven by the one node that was taking longer).
How many nodes are there in the cluster and what's the replication factor of the repaired keyspace?
There's 26 nodes right now between 11 datacenters, some of them with replication factor 3 and some with replication factor 1.
What do you mean by "removed, cleared, and readded one of the Cassandra nodes" ? Did you remove it from the ring and bootstrap it from scratch or did you replace it by itself in order to keep the token ownership unchanged?
Removed from the ring and bootstrapped from scratch.
26 nodes right now between 11 datacenters
Whoah, that doesn't allow for many segments to run at once actually, because some DCs will probably be fully busy with a single segment as we have an average of 2.3 nodes per DC (you probably have a different balance). A faulty node could indeed block repairs and make segments timeout over and over.
One thing you can check if you're wondering which nodes are actually involved in a segment is to list the rows in the running_repairs
table. It'll show you a list of nodes per repair and segment, knowing that one node can only run a single segment per repair at a time (the concurrency over different repairs will be controlled by the number of parallel repairs).
Removed from the ring and bootstrapped from scratch.
This changed token allocations and made the existing repairs no longer valid as some segments may now spread over two different token ranges. I'd recommend to abort running repairs and start new ones.
You're right there's really just space to repair one segment at a time. Hopefully, the node I re-bootstrapped was the one making trouble and it won't get stuck again. Is there anything I can reasonably do to speed up the whole repair process? A single segment takes "a few seconds" to repair, but it still only squeezes in a few per minute; e.g. is an interval of 1 second reasonable?
An interval of a single second seems fairly low, it'll put more pressure on the Cassandra backend as segments will get listed very very often. I'd recommend not to go under 10 seconds.
One thing you could do is create several repairs for the same keyspace but on different sets of tables (if possible). That'll make them run concurrently and assuming your cluster can handle it, it'll speed things up. Make sure you raise the number of threads then ;) >> REAPER_REPAIR_RUN_THREADS: 15
You can also set the intensity to 1 and attempt to reduce the number of segments, although it can be challenging with a cluster this size, spreading over so many datacenters (I'm having a hard time forecasting this though).
I'll close this ticket as Reaper seems to be working as expected here. Feel free to keep the conversation going if you have questions or reopen if you think there's really something wrong.
Hey @adejanovski , sorry that this has turned into a support ticket! Thank you for your help, much appreciated :)
Hi again! :) I'm running 2.2.5 in SIDECAR mode and my repair is progressing very slowly (it has repaired 19/3151 segments in the first 5 minutes and there has been no progress for the next 20 as I'm writing this issue). The event logs say
All nodes are busy or have too many pending compactions for the remaining candidate segments.
- but there's no compaction going on on any node and the nodes are not busy as far as I can tell - they're not repairing, have no write load and very light read load.This is a fresh install of Reaper, since I've ran into this issue first thing after upgrading to 2.0.5 -> 2.2.5 -- I recreated the backend keyspace, started a single instance to prepare the tables (on a previous attempt I believe I ran into a deadlock when all instances tried to prepare tables - every one was waiting for a migration lock), and then started all the sidecars when done.