k8ssandra / k8ssandra-operator

The Kubernetes operator for K8ssandra
https://k8ssandra.io/
Apache License 2.0
174 stars 79 forks source link

Reaper Error: Unable to connect to Cassandra cluster [clustername] #1411

Open asgeek opened 2 months ago

asgeek commented 2 months ago

What happened?

The following exception is visible in the Reaper logs from time to time, even though the Cassandra Cluster is healthy and Reaper is functioning as expected. It seems that the Reaper pod is being restarted due to this error.

Did you expect to see something different?

Reaper needs to be stable without any errors in the logs.

How to reproduce it (as minimally and precisely as possible):

Environment

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: my-cluster
  namespace: k8ssandra-operator
spec:
  cassandra:
    serverVersion: "3.11.12"
    softPodAntiAffinity: true
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: ceph-rbd
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
    config:
      jvmOptions:
        heapSize: 512M
      cassandraYaml:
        read_request_timeout_in_ms: 10000
        range_request_timeout_in_ms: 15000
        write_request_timeout_in_ms: 10000
    networking:
      hostNetwork: false
    resources:
      requests:
        cpu: "2"
        memory: 8Gi
      limits:
        cpu: "2" #  Should be 16 in production: https://docs.datastax.com/en/planning/oss/oss-capacity-planning.html
        memory: 32Gi
    datacenters:
    - metadata:
        name: e1-dc
      k8sContext: e1
      size: 3
    - metadata:
        name: e2-dc
      k8sContext: e2
      size: 3
  reaper: 
    containerImage:
      tag: "3.5.0"
  stargate:
    containerImage:
      registry: docker-proxy.domain.com
      tag: v1.0.77
    heapSize: 384Mi
    resources:
      requests:
        cpu: "1"
        memory: 1Gi
      limits:
        cpu: "2"
        memory: 4Gi
ERROR  [2024-09-18 05:50:20,004] [dw-admin-758] s.c.d.c.CassandraHealthCheck - Unable to connect to Cassandra cluster [clustername]
java.util.concurrent.TimeoutException: Waited 2000 milliseconds for com.datastax.driver.core.DefaultResultSetFuture@6dd014d7[status=PENDING]
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:470)
    at systems.composable.dropwizard.cassandra.CassandraHealthCheck.check(CassandraHealthCheck.java:50)
    at com.codahale.metrics.health.HealthCheck.execute(HealthCheck.java:374)
    at com.codahale.metrics.health.HealthCheckRegistry.runHealthChecks(HealthCheckRegistry.java:184)
    at com.codahale.metrics.servlets.HealthCheckServlet.runHealthChecks(HealthCheckServlet.java:177)
    at com.codahale.metrics.servlets.HealthCheckServlet.doGet(HealthCheckServlet.java:146)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:497)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:584)
    at com.codahale.metrics.servlets.AdminServlet.service(AdminServlet.java:153)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:584)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
    at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
    at io.dropwizard.jersey.filter.AllowedMethodsFilter.handle(AllowedMethodsFilter.java:47)
    at io.dropwizard.jersey.filter.AllowedMethodsFilter.doFilter(AllowedMethodsFilter.java:41)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at io.dropwizard.jetty.RoutingHandler.handle(RoutingHandler.java:52)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:829)

Anything else we need to know?:

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-256