scaleway / scaleway-cloud-controller-manager

Kubernetes Cloud Controller Manager for Scaleway
Apache License 2.0
53 stars 19 forks source link

Infinite restart of pod behind LB on par-2 #158

Closed samber closed 9 months ago

samber commented 9 months ago

Hi there,

I'm facing a bug while deploying Strimzi (kube controller for kafka) on Kapsule.

You will find my config below.

I'm trying to expose my Kafka cluster on the internet for temporary remote access with tls + scram auth. In cluster.yaml, I create 4 load balancers: 1 per broker + bootstrap. The LB IPs are registered into my cloudflare account using external-dns. The TLS certificate is generated by letsencrypt+certmanager (dns challenge).

Issue

When I switch the LB annotation service.beta.kubernetes.io/scw-loadbalancer-zone to fr-par-2, the pod is crash looping. This is a bit annoying for a broker like Kafka to restart indefinitely when you change a network config...

I played with healthcheck settings, but nothing changed. The only difference between par-1 and par-2 seems to be the ipv6 IP attached to LB.

Other issue: on pod restart, a new load balancer is created. After 15min, I got tens of new useless LBs, even if no service is declared in kube.

2 out of 3 data nodes will be scheduled on par-2, so moving the LBs would be nice.

Ticket: 00547345

Setup

helm repo add strimzi https://strimzi.io/charts/
helm install strimzi-cluster-operator \
    --version 0.39.0 \
    -f values.yaml \
    oci://quay.io/strimzi-helm/strimzi-kafka-operator 

kubectl create secret generic kafka-external-credentials \
    --from-literal=KAFKA_PASSWORD=xxxxx
kubectl create secret generic cloudflare-credentials \
    --from-literal=CF_API_TOKEN=xxxxxx

kubectl apply \
    -f priority.yaml \
    -f external-dns.yaml \
    -f certificate.yaml \
    -f kafka-kraft.yaml \
    -f kafka-broker.yaml \
    -f cluster.yaml \
    -f users.yaml

Yamls

Toleration and pod affinity have been hidden for better troubleshooting.

# values.yaml
priorityClassName: critical-priority

featureGates: "+KafkaNodePools,+UseKRaft"

fullReconciliationIntervalMs: 60000

podDisruptionBudget:
  enabled: true
  maxUnavailable: 1
  minAvailable: null
# priority.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-priority
value: 1000000

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 100000
# external-dns.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-dns
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-dns
rules:
  - apiGroups: [""]
    resources: ["services", "endpoints", "pods"]
    verbs: ["get", "watch", "list"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "watch", "list"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
  - kind: ServiceAccount
    name: external-dns
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: external-dns
  template:
    metadata:
      labels:
        app: external-dns
    spec:
      serviceAccountName: external-dns
      priorityClassName: critical-priority
      containers:
        - name: external-dns
          image: registry.k8s.io/external-dns/external-dns:v0.14.0
          args:
            - --source=service # ingress is also possible
            - --provider=cloudflare
            - --domain-filter=acme.org # (optional) limit to only example.com domains; change to match the zone created above.
            - --zone-id-filter=xxxxx # (optional) limit to a specific zone.
          env:
            - name: CF_API_TOKEN
              valueFrom:
                secretKeyRef:
                  name: cloudflare-credentials
                  key: CF_API_TOKEN
# certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cluster-fr-par-public-certificate
spec:
  secretName: cluster-fr-par-public-certificate
  additionalOutputFormats:
    - type: CombinedPEM
    - type: DER
  issuerRef:
    name: letsencrypt
    kind: ClusterIssuer
    group: cert-manager.io
  subject:
    organizations:
      - Screeb
  duration: 2160h # 90d
  renewBefore: 720h # 30d before SSL will expire, renew it
  dnsNames:
    - "*.kafka.acme.org"
# kafka-broker.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: kafka-broker-pool-a
  labels:
    app: kafka-broker
    strimzi.io/cluster: cluster-fr-par
  annotations:
    strimzi.io/next-node-ids: "[10-99]"
spec:
  replicas: 3
  roles:
    - broker
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 200Gi
        deleteClaim: false
  resources:
    requests:
      memory: 4Gi
      cpu: "1.5"
  jvmOptions:
    -Xms: 3g
    -Xmx: 3g
  template:
    pod:
      priorityClassName: high-priority
# kafka-kraft.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: kafka-controller
  labels:
    app: kafka-controller
    strimzi.io/cluster: cluster-fr-par
  annotations:
    strimzi.io/next-node-ids: "[0-9]"
spec:
  replicas: 3
  roles:
    - controller
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 50Gi
        deleteClaim: false
  resources:
    requests:
      memory: 2Gi
      cpu: "1.5"
  jvmOptions:
    -Xms: 1000m
    -Xmx: 1000m
  template:
    pod:
      priorityClassName: critical-priority
# users.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: user-scram
  labels:
    strimzi.io/cluster: cluster-fr-par
spec:
  authentication:
    type: scram-sha-512
    password:
      valueFrom:
        secretKeyRef:
          name: kafka-external-credentials
          key: KAFKA_PASSWORD
# cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: cluster-fr-par
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafka:
    version: 3.6.1

    # The replicas field is required by the Kafka CRD schema while the KafkaNodePools feature gate is in alpha phase.
    # But it will be ignored when Kafka Node Pools are used
    replicas: 3

    listeners:
      - name: external
        port: 9095
        type: loadbalancer
        tls: true
        authentication:
          type: scram-sha-512
        configuration:
          bootstrap:
            annotations:
              external-dns.alpha.kubernetes.io/hostname: bootstrap.kafka.acme.org.
              external-dns.alpha.kubernetes.io/ttl: "60"
              service.beta.kubernetes.io/scw-loadbalancer-zone: "fr-par-1"
              service.beta.kubernetes.io/scw-loadbalancer-type: "lb-s"
          brokers:
            - broker: 10
              annotations:
                external-dns.alpha.kubernetes.io/hostname: b0.kafka.acme.org.
                external-dns.alpha.kubernetes.io/ttl: "60"
                service.beta.kubernetes.io/scw-loadbalancer-zone: "fr-par-1"
                service.beta.kubernetes.io/scw-loadbalancer-type: "lb-s"
              advertisedHost: b0.kafka.acme.org
            - broker: 11
              annotations:
                external-dns.alpha.kubernetes.io/hostname: b1.kafka.acme.org.
                external-dns.alpha.kubernetes.io/ttl: "60"
                service.beta.kubernetes.io/scw-loadbalancer-zone: "fr-par-1"
                service.beta.kubernetes.io/scw-loadbalancer-type: "lb-s"
              advertisedHost: b1.kafka.acme.org
            - broker: 12
              annotations:
                external-dns.alpha.kubernetes.io/hostname: b2.kafka.acme.org.
                external-dns.alpha.kubernetes.io/ttl: "60"
                service.beta.kubernetes.io/scw-loadbalancer-zone: "fr-par-2"
                service.beta.kubernetes.io/scw-loadbalancer-type: "lb-s"
              advertisedHost: b2.kafka.acme.org
          brokerCertChainAndKey:
            secretName: cluster-fr-par-public-certificate
            certificate: tls.crt
            key: tls.key

    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.6"

    # The storage field is required by the Kafka CRD schema while the KafkaNodePools feature gate is in alpha phase.
    # But it will be ignored when Kafka Node Pools are used
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 200Gi
          deleteClaim: false

  # The ZooKeeper section is required by the Kafka CRD schema while the UseKRaft feature gate is in alpha phase.
  # But it will be ignored when running in KRaft mode
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false

  entityOperator:
    template:
      pod:
        priorityClassName: critical-priority

    tlsSidecar:
      resources:
        requests:
          cpu: 100m
          memory: 64Mi
        limits:
          cpu: 300m
          memory: 128Mi
Nox-404 commented 9 months ago

Hello, can you provide logs from the CCM (available through cockpit) as well as any other relevant logs (pod for instance) ?

Also can you share a cluster_id so we can have a look directly ?

You may reach us on slack if you prefer to share the details there.

samber commented 9 months ago

Quick follow-up: Strimzi seems to erase service.beta.kubernetes.io/scw-loadbalancer-id annotation. Scaleway controller is lost and look for a LB by its name in par-1.

Because the load balancer is not found, it recreates it, using the provided service.beta.kubernetes.io/scw-loadbalancer-zone: "fr-par-2" label.

...and repeat.

Nox-404 commented 9 months ago

The 0.28.5 release will fallback to lookup by name on the AZ specified in the zone annotation.

This should allow the CCM to find it's LB, but the two controllers will probably still fight to add/remove this annotation.

The best way to fix the issue is to provide the LB ids in the annotations in the Kafkas resource.