k8ssandra / cass-operator

The DataStax Kubernetes Operator for Apache Cassandra
https://docs.datastax.com/en/cass-operator/doc/cass-operator/cassOperatorGettingStarted.html
Apache License 2.0
181 stars 65 forks source link

C* pods get stuck in init state when medusa.storageSecret is omitted and medusa.storage is not local #60

Closed sync-by-unito[bot] closed 3 years ago

sync-by-unito[bot] commented 3 years ago

Bug Report

Currently, pods fail to leave the init state when configured with medusa.enabled: true and an empty medusa.storageSecret. In my case, I am configuring Medusa for use with S3 and pulling IAM credentials from the EC2 worker's metadata service.

As a workaround I can create an empty secret and reference that. Pods initialize and backups function.

Reproduction Steps

Steps to reproduce the behavior:

  1. Provision an EKS cluster with k8ssandra-terraform (note I'm running k8ssandra/k8ssandra-terraform#9)
  2. Install K8ssandra with the values specified below
  3. Watch pods not come up

Expected Behavior

A clear and concise description of what you expected to happen. Pods to schedule and backups to run

Environment

Helm charts version info

$ helm ls -A
NAME            NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
prod-k8ssandra  default     1           2021-04-30 23:29:25.248145738 -0400 EDT deployed    k8ssandra-1.1.0          

Helm charts user-supplied values

cassandra:
  # Version of Apache Cassandra to deploy
  version: "3.11.10"

  # Configuration for the /var/lib/cassandra mount point
  cassandraLibDirVolume:
    # AWS provides this storage class on EKS clusters out of the box. Note we
    # are using `gp2` here as it has `volumeBindingMode: WaitForFirstConsumer`
    # which is important during scheduling.
    storageClass: gp2

    # The recommended live data size is 1 - 1.5 TB. A 2 TB volume supports this
    # much data along with room for compactions. Consider increasing this value
    # as the number of provisioned IOPs is directly related to the volume size.
    size: 2048Gi

  heap:
   size: 8G
   newGenSize: 31G

  resources:
    requests:
      cpu: 4000m
      memory: 32Gi
    limits:
      cpu: 4000m
      memory: 32Gi

  # This key defines the logical topology of your cluster. The rack names and
  # labels should be updated to reflect the Availability Zones where your GKE
  # cluster is deployed.
  datacenters:
  - name: dc1
    size: 3
    racks:
    - name: us-east-1a
      affinityLabels:
        topology.kubernetes.io/zone: us-east-1a
    - name: us-east-1b
      affinityLabels:
        topology.kubernetes.io/zone: us-east-1b
    - name: us-east-1c
      affinityLabels:
        topology.kubernetes.io/zone: us-east-1c

stargate:
  enabled: true
  replicas: 3
  heapMB: 1024
  cpuReqMillicores: 1000
  cpuLimMillicores: 1000

medusa:
  enabled: true
  storage: s3

  # Reference the Terraform output for the correct bucket name to use here.
  bucketName: prod-k8ssandra-s3-bucket

  # The secret here must align with the value used in the previous section.
  # storageSecret: prod-k8ssandra-medusa-key

  storage_properties:
    region: us-east-1

Kubernetes version information:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.8-eks-96780e", GitCommit:"96780e1b30acbf0a52c38b6030d7853e575bcdf3", GitTreeState:"clean", BuildDate:"2021-03-10T21:32:29Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.21) and server (1.19) exceeds the supported minor version skew of +/-1

Kubernetes cluster kind:

EKS

Additional Context

$ kubectl get pods
...
prod-k8ssandra-dc1-us-east-1a-sts-0                   0/3     Init:0/4   0          2m22s
prod-k8ssandra-dc1-us-east-1b-sts-0                   0/3     Init:0/4   0          2m22s
prod-k8ssandra-dc1-us-east-1c-sts-0                   0/3     Init:0/4   0          2m23s
$ kubectl describe pod prod-k8ssandra-dc1-us-east-1a-sts-0
...
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               2m1s                default-scheduler        Successfully assigned default/prod-k8ssandra-dc1-us-east-1a-sts-0 to ip-10-0-1-61.ec2.internal
  Normal   SuccessfulAttachVolume  119s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-22bd724a-8100-43ee-9600-9cd89658f244"
  Warning  FailedMount             58s (x8 over 2m1s)  kubelet                  MountVolume.SetUp failed for volume "medusa-bucket-key" : secret "medusa-bucket-key" not found

┆Issue is synchronized with this Jira Task by Unito

sync-by-unito[bot] commented 3 years ago

➤ John Sanda commented:

Can you provide more details on how you are pulling credentials?

Can you share logs from the medusa-restore container?

Are you referencing a secret with no data?