redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
65 stars 92 forks source link

🫐🐛 Operator is unable to mount custom-provided self signed certs #1139

Open c4milo opened 3 months ago

c4milo commented 3 months ago

What happened?

No custom provided selfsigned certs were mounted despite them being correctly configured in redpanda.yaml. The change was done in a running cluster, I basically changed the CA from letsencrypt to self-signed. Provisioning an entirely new cluster works as expected.

Screenshot 2024-04-03 at 12 49 35 PM

What did you expect to happen?

I was expecting it to mount the selfsigned certs into Redpanda's containers.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

```console COMPUTED VALUES: affinity: {} auditLogging: clientMaxBufferSize: 16777216 enabled: false enabledEventTypes: null excludedPrincipals: null excludedTopics: null listener: internal partitions: 12 queueDrainIntervalMs: 500 queueMaxBufferSizePerShard: 1048576 replicationFactor: null auth: sasl: enabled: false mechanism: SCRAM-SHA-512 secretRef: redpanda-superusers users: [] clusterDomain: cluster.local commonLabels: {} config: cluster: default_topic_replications: 3 minimum_topic_replications: 3 node: crash_loop_limit: 5 pandaproxy_client: {} rpk: {} schema_registry_client: {} tunable: compacted_log_segment_size: 67108864 group_topic_partitions: 16 kafka_batch_max_bytes: 1048576 kafka_connection_rate_limit: 1000 log_segment_size: 134217728 log_segment_size_max: 268435456 log_segment_size_min: 16777216 max_compacted_log_segment_size: 536870912 topic_partitions_per_shard: 1000 connectors: deployment: create: false enabled: false test: create: false console: config: {} configmap: create: false deployment: create: false enabled: false secret: create: false enterprise: license: "" licenseSecretRef: key: license name: redpanda-license external: addresses: - $PREFIX_TEMPLATE domain: camilo.panda.dev enabled: true externalDns: enabled: true prefixTemplate: rp${POD_ORDINAL}-$(echo -n $HOST_IP_ADDRESS | sha256sum | head -c 7) service: enabled: true type: NodePort fullnameOverride: "" image: pullPolicy: IfNotPresent repository: docker.redpanda.com/redpandadata/redpanda tag: v23.3.7 imagePullSecrets: [] license_key: "" license_secret_ref: {} listeners: admin: external: admin-api: advertisedPorts: - 30644 authenticationMethod: sasl enabled: false port: 30644 tls: cert: letsencrypt enabled: true requireClientAuth: false default: advertisedPorts: - 31644 port: 9645 tls: cert: external port: 9644 tls: cert: selfsigned enabled: false requireClientAuth: false http: authenticationMethod: http_basic enabled: true external: default: advertisedPorts: - 30082 authenticationMethod: null port: 8083 tls: cert: external requireClientAuth: false http-proxy: advertisedPorts: - 31082 authenticationMethod: http_basic enabled: true port: 31082 tls: cert: letsencrypt enabled: true requireClientAuth: false kafkaEndpoint: default port: 8082 prefixTemplate: http-proxy$POD_ORDINAL tls: cert: selfsigned enabled: true requireClientAuth: false kafka: authenticationMethod: sasl external: default: advertisedPorts: - 31092 authenticationMethod: null port: 9094 tls: cert: external kafka-api: advertisedPorts: - 32092 authenticationMethod: sasl enabled: true port: 32092 tls: cert: letsencrypt requireClientAuth: false port: 9092 prefixTemplate: kafka-api$POD_ORDINAL tls: cert: selfsigned requireClientAuth: false rpc: port: 33145 tls: cert: selfsigned requireClientAuth: false schemaRegistry: authenticationMethod: http_basic enabled: true external: default: advertisedPorts: - 30081 authenticationMethod: null port: 8084 tls: cert: external requireClientAuth: false schema-registry: advertisedPorts: - 31081 authenticationMethod: http_basic enabled: true port: 31081 tls: cert: letsencrypt requireClientAuth: false kafkaEndpoint: default port: 8081 tls: cert: selfsigned requireClientAuth: false logging: logLevel: debug usageStats: clusterId: 9m4e2mr0ui3e8a215n4g enabled: true monitoring: enabled: false labels: {} scrapeInterval: 30s tlsConfig: {} nameOverride: "" nodeSelector: {} post_install_job: affinity: {} enabled: true post_upgrade_job: affinity: {} enabled: true rackAwareness: enabled: true nodeAnnotation: topology.kubernetes.io/zone rbac: annotations: {} enabled: false resources: cpu: cores: "3" memory: container: max: 8Gi min: 8Gi serviceAccount: annotations: azure.workload.identity/client-id: c90db393-857d-41d0-ac0d-0e61271fcaa6 create: true name: id-rpcloud-9m4e2mr0ui3e8a215n4 statefulset: additionalRedpandaCmdFlags: - --abort-on-seastar-bad-alloc - --dump-memory-diagnostics-on-alloc-failure-kind=all annotations: {} budget: maxUnavailable: 1 extraVolumeMounts: "" extraVolumes: "" initContainerImage: repository: busybox tag: latest initContainers: configurator: extraVolumeMounts: "" resources: {} extraInitContainers: "" fsValidator: enabled: true expectedFS: xfs extraVolumeMounts: "" resources: {} setDataDirOwnership: enabled: true extraVolumeMounts: "" resources: {} setTieredStorageCacheDirOwnership: extraVolumeMounts: "" resources: {} tuning: extraVolumeMounts: "" resources: {} livenessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 nodeSelector: cloud.redpanda.com/role: redpanda podAffinity: {} podAntiAffinity: custom: {} topologyKey: kubernetes.io/hostname type: hard weight: 100 priorityClassName: "" readinessProbe: failureThreshold: 3 initialDelaySeconds: 1 periodSeconds: 10 successThreshold: 1 replicas: 3 securityContext: fsGroup: 101 fsGroupChangePolicy: OnRootMismatch runAsUser: 101 sideCars: configWatcher: enabled: true extraVolumeMounts: "" resources: {} securityContext: {} controllers: createRBAC: true enabled: false healthProbeAddress: :8085 image: repository: docker.redpanda.com/redpandadata/redpanda-operator tag: v2.1.10-23.2.18 metricsAddress: :9082 resources: {} run: - all securityContext: {} startupProbe: failureThreshold: 120 initialDelaySeconds: 1 periodSeconds: 10 terminationGracePeriodSeconds: 90 tolerations: - effect: NoSchedule key: cloud.redpanda.com/role operator: Equal value: redpanda topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway updateStrategy: type: RollingUpdate storage: hostPath: "" persistentVolume: annotations: {} enabled: true labels: {} nameOverwrite: "" size: 4096Gi storageClass: local-path tiered: config: cloud_storage_access_key: "" cloud_storage_api_endpoint: "" cloud_storage_azure_container: null cloud_storage_azure_shared_key: null cloud_storage_azure_storage_account: null cloud_storage_bucket: "" cloud_storage_cache_size: 5368709120 cloud_storage_credentials_source: config_file cloud_storage_enable_remote_read: true cloud_storage_enable_remote_write: true cloud_storage_enabled: false cloud_storage_region: "" cloud_storage_secret_key: "" credentialsSecretRef: accessKey: configurationKey: cloud_storage_access_key secretKey: configurationKey: cloud_storage_secret_key hostPath: "" mountType: persistentVolume persistentVolume: annotations: {} labels: {} storageClass: local-path tests: enabled: true tls: certs: default: caEnabled: true external: caEnabled: true letsencrypt: caEnabled: false duration: 43800h0m0s issuerRef: kind: ClusterIssuer name: letsencrypt-dns selfsigned: caEnabled: true duration: 43800h0m0s issuerRef: kind: ClusterIssuer name: redpanda.local enabled: true tolerations: [] tuning: tune_aio_events: false ```

Anything else we need to know?

Name:         redpanda-broker
Namespace:    redpanda
Labels:       app.kubernetes.io/component=redpanda
              app.kubernetes.io/instance=redpanda-broker
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=redpanda
              helm.sh/chart=redpanda-5.7.37
              helm.toolkit.fluxcd.io/name=redpanda-broker
              helm.toolkit.fluxcd.io/namespace=redpanda
Annotations:  meta.helm.sh/release-name: redpanda-broker
              meta.helm.sh/release-namespace: redpanda

Data
====
bootstrap.yaml:
----
kafka_enable_authorization: false
enable_sasl: false
enable_rack_awareness: true

default_topic_replications: 3
minimum_topic_replications: 3

compacted_log_segment_size: 67108864
group_topic_partitions: 16
kafka_batch_max_bytes: 1048576
kafka_connection_rate_limit: 1000
log_segment_size: 134217728
log_segment_size_max: 268435456
log_segment_size_min: 16777216
max_compacted_log_segment_size: 536870912
topic_partitions_per_shard: 1000
storage_min_free_bytes: 5368709120

audit_enabled: false

redpanda.yaml:
----
config_file: /etc/redpanda/redpanda.yaml
cluster_id: 9m4e2mr0ui3e8a215n4g
redpanda:
  empty_seed_starts_cluster: false
  kafka_enable_authorization: false
  enable_sasl: false
  default_topic_replications: 3
  minimum_topic_replications: 3
  compacted_log_segment_size: 67108864
  group_topic_partitions: 16
  kafka_batch_max_bytes: 1048576
  kafka_connection_rate_limit: 1000
  log_segment_size: 134217728
  log_segment_size_max: 268435456
  log_segment_size_min: 16777216
  max_compacted_log_segment_size: 536870912
  topic_partitions_per_shard: 1000
  storage_min_free_bytes: 5368709120

  crash_loop_limit: "5"
  audit_enabled: false

  admin:
    - name: internal
      address: 0.0.0.0
      port: 9644
    - name: default
      address: 0.0.0.0
      port: 9645
  admin_api_tls:
  kafka_api:
    - name: internal
      address: 0.0.0.0
      port: 9092
      authentication_method: sasl
    - name: default
      address: 0.0.0.0
      port: 9094
    - name: kafka-api
      address: 0.0.0.0
      port: 32092
      authentication_method: sasl
  kafka_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/selfsigned/tls.crt
      key_file: /etc/tls/certs/selfsigned/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/selfsigned/ca.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: kafka-api
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false

      truststore_file: /etc/ssl/certs/ca-certificates.crt
  rpc_server:
    address: 0.0.0.0
    port: 33145
  rpc_server_tls:
    enabled: true
    cert_file: /etc/tls/certs/selfsigned/tls.crt
    key_file: /etc/tls/certs/selfsigned/tls.key
    require_client_auth: false
    truststore_file: /etc/tls/certs/selfsigned/ca.crt
  seed_servers: 
    - host:
        address: redpanda-broker-0.redpanda-broker.redpanda.svc.cluster.local.
        port: 33145
    - host:
        address: redpanda-broker-1.redpanda-broker.redpanda.svc.cluster.local.
        port: 33145
    - host:
        address: redpanda-broker-2.redpanda-broker.redpanda.svc.cluster.local.
        port: 33145

schema_registry_client:
  brokers:
  - address: redpanda-broker-0.redpanda-broker.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-broker-1.redpanda-broker.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-broker-2.redpanda-broker.redpanda.svc.cluster.local.
    port: 9092
  broker_tls:
    enabled: true
    require_client_auth: false
    cert_file: /etc/tls/certs/selfsigned/tls.crt
    key_file: /etc/tls/certs/selfsigned/tls.key
    truststore_file: /etc/tls/certs/selfsigned/ca.crt
schema_registry:
  schema_registry_api:
    - name: internal
      address: 0.0.0.0
      port: 8081
      authentication_method: http_basic
    - name: default
      address: 0.0.0.0
      port: 8084
    - name: schema-registry
      address: 0.0.0.0
      port: 31081
      authentication_method: http_basic
  schema_registry_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/selfsigned/tls.crt
      key_file: /etc/tls/certs/selfsigned/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/selfsigned/ca.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: schema-registry
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt

pandaproxy_client:
  brokers:
  - address: redpanda-broker-0.redpanda-broker.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-broker-1.redpanda-broker.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-broker-2.redpanda-broker.redpanda.svc.cluster.local.
    port: 9092
  broker_tls:
    enabled: true
    require_client_auth: false
    cert_file: /etc/tls/certs/selfsigned/tls.crt
    key_file: /etc/tls/certs/selfsigned/tls.key
    truststore_file: /etc/tls/certs/selfsigned/ca.crt
pandaproxy:
  pandaproxy_api:
    - name: internal
      address: 0.0.0.0
      port: 8082
      authentication_method: http_basic
    - name: default
      address: 0.0.0.0
      port: 8083
    - name: http-proxy
      address: 0.0.0.0
      port: 31082
      authentication_method: http_basic
  pandaproxy_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/selfsigned/tls.crt
      key_file: /etc/tls/certs/selfsigned/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/selfsigned/ca.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: http-proxy
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt

rpk:
  # redpanda server configuration
  overprovisioned: false
  enable_memory_locking: false
  additional_start_flags:
    - "--smp=3"
    - "--memory=6553M"
    - "--reserve-memory=216M"
    - "--default-log-level=debug"
    - --abort-on-seastar-bad-alloc
    - --dump-memory-diagnostics-on-alloc-failure-kind=all
  # rpk tune entries
  tune_aio_events: false

  # kafka connection configuration
  kafka_api:
    brokers: 
      - redpanda-broker-0.redpanda-broker.redpanda.svc.cluster.local.:9092
      - redpanda-broker-1.redpanda-broker.redpanda.svc.cluster.local.:9092
      - redpanda-broker-2.redpanda-broker.redpanda.svc.cluster.local.:9092
    tls:
      truststore_file: /etc/tls/certs/selfsigned/ca.crt
  admin_api:
    addresses: 
      - redpanda-broker-0.redpanda-broker.redpanda.svc.cluster.local.:9644
      - redpanda-broker-1.redpanda-broker.redpanda.svc.cluster.local.:9644
      - redpanda-broker-2.redpanda-broker.redpanda.svc.cluster.local.:9644
    tls:

BinaryData
====

Events:  <none>

Name:         redpanda-broker-rpk
Namespace:    redpanda
Labels:       app.kubernetes.io/component=redpanda
              app.kubernetes.io/instance=redpanda-broker
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=redpanda
              helm.sh/chart=redpanda-5.7.37
              helm.toolkit.fluxcd.io/name=redpanda-broker
              helm.toolkit.fluxcd.io/namespace=redpanda
Annotations:  meta.helm.sh/release-name: redpanda-broker
              meta.helm.sh/release-namespace: redpanda

Data
====
profile:
----
name: default
kafka_api:
  brokers: 
      - $PREFIX_TEMPLATE.camilo.panda.dev:31092
      - $PREFIX_TEMPLATE.camilo.panda.dev:31092
      - $PREFIX_TEMPLATE.camilo.panda.dev:31092
  tls:
admin_api:
  addresses: 
      - $PREFIX_TEMPLATE.camilo.panda.dev:30644
      - $PREFIX_TEMPLATE.camilo.panda.dev:30644
      - $PREFIX_TEMPLATE.camilo.panda.dev:30644
  tls:

BinaryData
====

Events:  <none>

Name:         redpanda-operator-config
Namespace:    redpanda
Labels:       app.kubernetes.io/instance=redpanda-operator
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=operator
              app.kubernetes.io/version=v2.1.15-23.3.7
              helm.sh/chart=operator-0.4.20
Annotations:  meta.helm.sh/release-name: redpanda-operator
              meta.helm.sh/release-namespace: redpanda

Data
====
controller_manager_config.yaml:
----
map[apiVersion:controller-runtime.sigs.k8s.io/v1alpha1 health:map[healthProbeBindAddress::8081] kind:ControllerManagerConfig leaderElection:map[leaderElect:true resourceName:aa9fc693.vectorized.io] metrics:map[bindAddress:127.0.0.1:8080] webhook:map[port:9443]]

BinaryData
====

Events:  <none>

Which are the affected charts?

Redpanda, Operator

Chart Version(s)

```console ❯ helm -n redpanda list NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION redpanda redpanda 3 2024-04-03 12:57:53.993206 -0400 EDT deployed redpanda-0.1.1 0.1.0 redpanda-operator redpanda 1 2024-04-01 18:14:29.732128 -0400 EDT deployed operator-0.4.20 v2.1.15-23.3.7 ```

Cloud provider

Azure/AKS

JIRA Link: K8S-137

c4milo commented 3 months ago

I'm triple checking this one.

c4milo commented 3 months ago

I deleted the Redpanda CR and deployed it from scratch and it is mounting correctly the volume now. I think the problem happens when those configurations are changed in a running cluster.

chrisseto commented 3 months ago

@c4milo is it possible for you to reproduce this issue? The chart itself doesn't seem to have anything that could result in the behavior you're describing.

The closest I've been able to reproduce is that it's possible to upgrade into a broken state depending on how you change the values file and that would stall out trying to update the highest ordinal Statefulset replica which could look like ordinal 1 had outdated volume mounts.

Did you see error logs from redpanda that would indicate that it was started up with an updated broker config but without the correct volume mounts?

c4milo commented 3 months ago

I can try again, what I did was basically this:

  1. Deployed cluster passing a self-signed ClusterIssuer to the operator.
  2. Changed the name of the issuer, to simulate a change of issuer.
  3. Ran helm upgrade, through Terraform.
  4. The pods kept on trying to mount the old issuer.
c4milo commented 2 months ago

Update: I have since walked around this by passing the certificates and keys directly through secret references.