Closed lavaraja closed 3 years ago
This happens when all nodes have changed in the cluster without Reaper being updated with the new cluster definition. My recommendation here would be to re-register the cluster (through the REST API for example) when changing the topology, which will update the list of nodes in the cluster definition in reaper_db.
Thank you. Could you please share the command to re-register the cluster via REST API or to update the new nodes information in reaper_db?. We are running reaper in SIDECAR mode.
From the docs:
You can also use spreaper add-cluster
command or even re-register the cluster through the UI, I think that should do it as well.
Plenty of options ;)
Thank you. It worked.
Hey @adejanovski , We are trying to re-register the cluster via the HTTP API, but when we perform the PUT request we get the same error:
curl --location --request PUT 'myHostName:8080/cluster/myCluster?seedHost=myHostName&jmxPort=9999'
And we get "There was an error processing your request. It has been logged (ID 9210ff6a668661a0).', showing the below error in the logs:
PUT /cluster/myHostName?seedHost=myHostName&jmxPort=9999] i.d.j.e.LoggingExceptionMapper - Error handling a request: 9210ff6a668661a0 java.lang.IllegalArgumentException: Trying to add/update cluster using an existing name: collectors-data-store. No nodes overlap between 10.112.189.130,10.112.189.133,10.112.189.137 and 10.112.189.135,10.112.189.136,10.112.189.140
That's because all the nodes changed IP it seems. We have a mechanism to prevent Reaper from mixing clusters that have the same name (it sadly happens...), so if you're trying to register a cluster that already exists but have no overlap in nodes, we'll consider it's a different cluster and will prevent its registration. You then need to delete the cluster and then recreate it.
We have a Cassandra cluster and running reaper in SIDECAR mode. Recently we have re-hydrated our cluster and removed old nodes and added new nodes. When we start repair on the cluster we are seeing below error message in reaper logs and repair is not progressing.
Reaper-config: ###################################
Cassandra Reaper Configuration Example.
See a bit more complete example in:
src/server/src/test/resources/cassandra-reaper.yaml
segmentCountPerNode: 64 repairParallelism: DATACENTER_AWARE repairIntensity: 0.9 scheduleDaysBetween: 7 repairRunThreadCount: 15 hangingRepairTimeoutMins: 60 storageType: cassandra enableCrossOrigin: true incrementalRepair: false blacklistTwcsTables: true enableDynamicSeedList: true repairManagerSchedulingIntervalSeconds: 10 activateQueryLogger: false jmxConnectionTimeoutInSeconds: 10 useAddressTranslator: false
purgeRecordsAfterInDays: 30
numberOfRunsToKeepPerUnit: 10
enableConcurrentMigrations: false
datacenterAvailability has three possible values: ALL | LOCAL | EACH | SIDECAR
the correct value to use depends on whether jmx ports to C* nodes in remote datacenters are accessible
If the reaper has access to all node jmx ports, across all datacenters, then configure to ALL.
If jmx access is only available to nodes in the same datacenter as reaper in running in, then configure to LOCAL.
If there's a reaper instance running in every datacenter, and it's important that nodes under duress are not involved in repairs,
then configure to EACH.
If jmx access is restricted to localhost, then configure to SIDECAR.
The default is ALL
datacenterAvailability: SIDECAR
jmxAuth:
username: myUsername
password: myPassword
logging: level: WARN loggers: com.datastax.driver.core.QueryLogger.NORMAL: level: WARN additive: false appenders:
server: type: default applicationConnectors:
cassandra: clusterName: "XXXXX" contactPoints: ["XXXXXX"] keyspace: reaper_db loadBalancingPolicy: type: tokenAware shuffleReplicas: true subPolicy: type: dcAwareRoundRobin localDC: usedHostsPerRemoteDC: 0 allowRemoteDCsForLocalConsistencyLevel: false authProvider: type: plainText username: XXXX password: XXXXX
ssl:
autoScheduling: enabled: false initialDelayPeriod: PT15S periodBetweenPolls: PT10M timeBeforeFirstSchedule: PT5M scheduleSpreadPeriod: PT6H excludedKeyspaces:
Uncomment the following to enable dropwizard metrics
Configure to the reporter of your choice
Reaper also provides prometheus metrics on the admin port at /prometheusMetrics
metrics:
frequency: 1 minute
reporters:
- type: log
logger: metrics
Authentication is enabled by default
accessControl: sessionTimeout: PT10M shiro: iniConfigs: ["classpath:shiro.ini"]
################################################
Error: java.lang.IllegalArgumentException: Trying to add/update cluster using an existing name: poc_cassandra_aws_cluster. No nodes overlap between 10.24.78.217,10.24.78.249,10.24.78.63,10.24.79.19,10.24.79.227,10.24.79.99 and 10.24.76.205,10.24.78.93,10.24.78.189,10.24.76.132,10.24.79.214,10.24.79.119 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:412) at io.cassandrareaper.storage.CassandraStorage.addClusterAssertions(CassandraStorage.java:640) at io.cassandrareaper.storage.CassandraStorage.addCluster(CassandraStorage.java:602) at io.cassandrareaper.storage.CassandraStorage.updateCluster(CassandraStorage.java:621) at io.cassandrareaper.service.RepairRunner.updateClusterNodeList(RepairRunner.java:306) at io.cassandrareaper.service.RepairRunner.run(RepairRunner.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:117) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:38) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
The reaper.log file is flooded with above messages and generated 15GB log file with same message. we have stopped reaper on all the nodes and dropped reaper_db keyspace and started reaper-service to fix the issue. We are seeing this issue particularly when cluster topology is changed i.e when new nodes added/removed.
Any solution for this?