thelastpickle / cassandra-reaper

Automated Repair Awesomeness for Apache Cassandra
http://cassandra-reaper.io/
Apache License 2.0
490 stars 218 forks source link

Multiple DC aware repairs can't be scheduled for the same column family #751

Open cin opened 5 years ago

cin commented 5 years ago

Project board link

Reaper version 1.44 C* version 3.9.0

We have a C* setup with 2 DCs (one in us-south and one in us-east). We're running reaper as described in the "Multiple Reaper instances with JMX accessible locally to each DC" section of the docs. The reaper running in each DC can talk over JMX to all nodes in the ring. We'd like to schedule repairs to run on our tables on alternating nights in each DC. We're trying to do DC aware repairs as well so we're not hammering our entire cluster when running repairs. However, I can't create multiple repairs for the same cluster, keyspace, and column family. After scheduling repairs in us-south all looked good. Unfortunately when setting up repairs for us-east through automation using the API, I got back a 204 and thought everything was good until I checked the UI -- only the us-south schedules were there. I tried to manually add a repair through the UI and got back a 409 with the following message, "A repair schedule already exists for cluster "cassandra", keyspace "usage", and column families: [usagebymonth]". Is this a bug or should I not be trying to schedule repairs this way? I can think of a few ways to work around this but they involve separating the reaper UIs and keyspaces. This feels wrong. TIA.

cassandra-us-south-reaper-7bbdb8b9d6-thwkj reaper INFO   [2019-09-12 15:59:54,398] [dw-74 - POST /repair_schedule?clusterName=cassandra&keyspace=usage&owner=cassadmin&scheduleDaysBetween=7&incrementalRepair=false&scheduleTriggerTime=2019-09-12T04:00:00&tables=usagebymonth
&datacenters=us-south&repairThreadCount=4&repairParallelism=datacenter_aware] i.c.r.RepairScheduleResource - first schedule activation will be: 2019-09-12T04:00:00Z

cassandra-us-east-reaper-5b78f4bb78-8dskx reaper INFO   [2019-09-12 16:24:43,204] [dw-79 - POST /repair_schedule?clusterName=cassandra&keyspace=usage&owner=cassadmin&scheduleDaysBetween=7&incrementalRepair=false&scheduleTriggerTime=2019-09-13T04:00:00&tables=usagebymonth&
datacenters=us-east&repairThreadCount=4&repairParallelism=datacenter_aware] i.c.r.RepairScheduleResource - first schedule activation will be: 2019-09-13T04:00:00Z

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: REAP-150

adejanovski commented 5 years ago

Hi @cin,

wow, that sucks. Our current way of defining uniqueness in repair definition is clearly no good. I'll try to review this quickly and come up with something more flexible.