Open StevenLacerda opened 2 years ago
Hi, I know it's a fairly old ticket and apologize for taking so long to respond. I do not think this can be Reaper related. Reaper does not specify which nodes should be involved in the repair, it just starts a repair session for a list of token ranges through one of the live nodes which then acts as a coordinator. This coordinator is then responsible for contacting the nodes that should be involved in that repair. The internode comms issue is a red herring here I think and the crux of the issue is the inability to create a snapshot, but that's a Cassandra problem I'd say. Snapshots get created by Cassandra automatically when a sequential/dc aware repair is used. Again, Reaper has no control over the name and location of the snapshot.
Project board link
We're getting an fs error which is downing a node. Here's what's happening:
1) Reaper starts, then has internode comms issues:
That node was removed from the cluster about 3 weeks ago, but they didn't remove it from the seeds list.
To me, it seems like we're creating the directory for snapshots, it then fails, it then tries to recreate the snapshot in the same directory and that causes the fs error because it's not empty...does that sound plausible?
┆Issue is synchronized with this Jira Story by Unito