thelastpickle / cassandra-reaper

Automated Repair Awesomeness for Apache Cassandra
http://cassandra-reaper.io/
Apache License 2.0
491 stars 218 forks source link

Auto scheduler encounters null pointer exception #298

Closed zorkian closed 2 years ago

zorkian commented 6 years ago

We're evaluating the Cassandra Reaper here and have run into an issue with the auto scheduler. We're using version 1.0.2, with Cassandra version 3.0.15.

This is the only output the Reaper prints, running in DEBUG mode:

DEBUG  [2017-12-12 02:10:12,520] [AutoSchedulingManagerTimer] i.c.s.AutoSchedulingManager - Checking cluster keyspaces to identify which ones require repair schedules...
DEBUG  [2017-12-12 02:10:12,523] [AutoSchedulingManagerTimer] i.c.j.JmxProxy - Connecting to [[SERVER_IP]]...
DEBUG  [2017-12-12 02:10:12,548] [AutoSchedulingManagerTimer] i.c.j.JmxProxy - JMX connection to [[SERVER_IP]] properly connected: service:jmx:rmi:///jndi/rmi://[[SERVER_IP]]:7199/jmxrmi
DEBUG  [2017-12-12 02:10:12,548] [AutoSchedulingManagerTimer] i.c.j.HostConnectionCounters - Host [[SERVER_IP]] has 2 successfull connections
DEBUG  [2017-12-12 02:10:12,550] [AutoSchedulingManagerTimer] i.c.j.JmxProxy - close JMX connection to '[[SERVER_IP]]': service:jmx:rmi:///jndi/rmi://[[SERVER_IP]]:7199/jmxrmi
ERROR  [2017-12-12 02:10:12,600] [AutoSchedulingManagerTimer] i.c.s.AutoSchedulingManager - Error while scheduling repairs for cluster io.cassandrareaper.core.Cluster@5fe46d35
java.lang.NullPointerException: null
        at io.cassandrareaper.service.ClusterRepairScheduler.keyspaceCandidateForRepair(ClusterRepairScheduler.java:94)
        at io.cassandrareaper.service.ClusterRepairScheduler.lambda$scheduleRepairs$1(ClusterRepairScheduler.java:64)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at io.cassandrareaper.service.ClusterRepairScheduler.scheduleRepairs(ClusterRepairScheduler.java:65)
        at io.cassandrareaper.service.AutoSchedulingManager.run(AutoSchedulingManager.java:72)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505)

This repeats every 10 minutes. Manually scheduled repair runs seem to work correctly, it's just the autoscheduler is failing.

┆Issue is synchronized with this Jira Task by Unito ┆Issue Number: K8SSAND-331

adejanovski commented 6 years ago

Hi @zorkian ,

I've failed to reproduce the issue so far. Could you share your reaper yaml file here so that I can check your settings ?

Thanks

tfendt commented 6 years ago

I am getting the same issue.

Here is my yaml file:

# Cassandra Reaper Configuration Example.
# See a bit more complete example in:
# src/test/resources/cassandra-reaper.yaml
segmentCount: 200
repairParallelism: DATACENTER_AWARE
repairIntensity: 0.9
scheduleDaysBetween: 7
repairRunThreadCount: 15
hangingRepairTimeoutMins: 30
storageType: cassandra
enableCrossOrigin: true
incrementalRepair: false
enableDynamicSeedList: true
repairManagerSchedulingIntervalSeconds: 10
activateQueryLogger: false
jmxConnectionTimeoutInSeconds: 5

# datacenterAvailability has three possible values: ALL | LOCAL | EACH
# the correct value to use depends on whether jmx ports to C* nodes in remote datacenters are accessible
# If the reaper has access to all node jmx ports, across all datacenters, then configure to ALL.
# If jmx access is only available to nodes in the same datacenter as reaper in running in, then configure to LOCAL.
# If there's a reaper instance running in every datacenter, and it's important that nodes under duress are not involved in repairs,
#    then configure to EACH.
#
# The default is ALL
datacenterAvailability: ALL

jmxPorts:
  127.0.0.1: 7100
  127.0.0.2: 7200
  127.0.0.3: 7300
  127.0.0.4: 7400
  127.0.0.5: 7500
  127.0.0.6: 7600
  127.0.0.7: 7700
  127.0.0.8: 7800

#jmxAuth:
#  username: myUsername
#  password: myPassword

logging:
  level: INFO
  loggers:
    com.datastax.driver.core.QueryLogger.NORMAL:
      level: WARN
      additive: false
      appenders:
        - type: file
          currentLogFilename: query-logger.log
          archivedLogFilenamePattern: query-logger-%d.log.gz
          archivedFileCount: 2
    io.dropwizard: WARN
    org.eclipse.jetty: WARN
  appenders:
    - type: console
      logFormat: "%-6level [%d] [%t] %logger{5} - %msg %n"

server:
  type: default
  applicationConnectors:
    - type: http
      port: 8080
      bindHost: 0.0.0.0
  adminConnectors:
    - type: http
      port: 8081
      bindHost: 0.0.0.0
  requestLog:
    appenders: []

cassandra:
  clusterName: "CassandraProd"
  contactPoints: [""]
  keyspace: reaper_db
  loadBalancingPolicy:
    type: tokenAware
    shuffleReplicas: true
    subPolicy:
      type: dcAwareRoundRobin
      localDC: us-east
      usedHostsPerRemoteDC: 0
      allowRemoteDCsForLocalConsistencyLevel: false
  authProvider:
    type: plainText
    username: cassandra_reaper
    password: 

autoScheduling:
  enabled: true
  initialDelayPeriod: PT15S
  periodBetweenPolls: PT10M
  timeBeforeFirstSchedule: PT5M
  scheduleSpreadPeriod: PT6H
  excludedKeyspaces:

# Uncomment the following to enable dropwizard metrics
#  Configure to the reporter of your choice
#  Reaper also provides prometheus metrics on the admin port at /prometheusMetrics
#metrics:
#  frequency: 1 minute
#  reporters:
#    - type: log
#      logger: metrics
michaelsembwever commented 5 years ago

@zorkian and @tfendt,   If you comment the line…

#  excludedKeyspaces:

does it fix the problem?

adejanovski commented 2 years ago

Closing ticket due to inactivity.