redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
62 stars 92 forks source link

Issues with Console and Connectors when mTLS is enabled for the Admin API #835

Open JakeSCahill opened 8 months ago

JakeSCahill commented 8 months ago

What happened?

When enabling mTLS for the Admin API, Console and Connectors fail to start. Console reports that it's missing TLS certs:

 {"level":"info","ts":"2023-10-27T15:11:53.281Z","msg":"testing admin client connectivity","urls":["https://redpanda.redpanda.svc.cluster.local.:9644"]}
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
{"level":"fatal","ts":"2023-10-27T15:11:56.352Z","msg":"failed to create Redpanda service","error":"failed to test admin client connectivity: Get \"https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers\": remote error: tls: certificate required"}

If I try to disable mTLS after enabling it, the post-upgrade job fails with Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded.

Post-upgrade logs:

Request error, trying another node: Get "https://redpanda-0.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": remote error: tls: certificate required
Request error, trying another node: Get "https://redpanda-1.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": remote error: tls: certificate required
unable to query config schema: Get "https://redpanda-2.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": dial tcp 10.244.2.3:9644: connect: connection refused

If I re-enable mTLS, Console starts running, but there are issues with Admin API connections.

https://github.com/redpanda-data/helm-charts/assets/45230295/62e3ea6f-cdb4-4eb0-93ca-8a12f9f0ddd7

What did you expect to happen?

Redpanda Console and Connectors should work even if mTLS is enabled.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

Running in a kind cluster. I had a few overrides as I was testing a few things. To enable mTLS with Connectors enabled: ``` export DOMAIN=customredpandadomain.local && \ helm repo add redpanda https://charts.redpanda.com/ helm repo update helm upgrade --install redpanda redpanda/redpanda \ --namespace redpanda \ --create-namespace \ --set external.domain=${DOMAIN} \ --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set connectors.deployment.terminationGracePeriodSeconds=300 --set connectors.nameOverride="test-name-2" --set nameOverride="rp-test" --set listeners.admin.tls.requireClientAuth=true --set auth.sasl.enabled=true --set auth.sasl.secretRef=redpanda-superusers ``` To try to disable mTLS: ``` export DOMAIN=customredpandadomain.local && \ helm repo add redpanda https://charts.redpanda.com/ helm repo update helm upgrade --install redpanda redpanda/redpanda \ --namespace redpanda \ --create-namespace \ --set external.domain=${DOMAIN} \ --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set connectors.deployment.terminationGracePeriodSeconds=300 --set connectors.nameOverride="test-name-2" --set nameOverride="rp-test" --set auth.sasl.enabled=true --set auth.sasl.secretRef=redpanda-superusers ``` ```console $ helm get values -n --all COMPUTED VALUES: affinity: {} auth: sasl: enabled: true mechanism: SCRAM-SHA-512 secretRef: redpanda-superusers users: [] clusterDomain: cluster.local commonLabels: {} config: cluster: default_topic_replications: 3 node: crash_loop_limit: 5 pandaproxy_client: {} rpk: {} schema_registry_client: {} tunable: compacted_log_segment_size: 67108864 group_topic_partitions: 16 kafka_batch_max_bytes: 1048576 kafka_connection_rate_limit: 1000 log_segment_size: 134217728 log_segment_size_max: 268435456 log_segment_size_min: 16777216 max_compacted_log_segment_size: 536870912 topic_partitions_per_shard: 1000 connectors: auth: sasl: enabled: false mechanism: scram-sha-512 secretRef: "" userName: "" commonLabels: {} connectors: additionalConfiguration: "" bootstrapServers: "" brokerTLS: ca: secretNameOverwrite: "" secretRef: "" cert: secretNameOverwrite: "" secretRef: "" enabled: false key: secretNameOverwrite: "" secretRef: "" groupID: connectors-cluster producerBatchSize: 131072 producerLingerMS: 1 restPort: 8083 schemaRegistryURL: "" secretManager: connectorsPrefix: "" consolePrefix: "" enabled: false region: "" storage: remote: read: config: false offset: false status: false write: config: false offset: false status: false replicationFactor: config: -1 offset: -1 status: -1 topic: config: _internal_connectors_configs offset: _internal_connectors_offsets status: _internal_connectors_status container: javaGCLogEnabled: "false" resources: javaMaxHeapSize: 2G limits: cpu: 1 memory: 2350Mi request: cpu: 1 memory: 2350Mi securityContext: allowPrivilegeEscalation: false deployment: annotations: {} budget: maxUnavailable: 1 create: false extraEnv: [] livenessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 nodeAffinity: {} nodeSelector: {} podAffinity: {} podAntiAffinity: custom: {} topologyKey: kubernetes.io/hostname type: hard weight: 100 priorityClassName: "" progressDeadlineSeconds: 600 readinessProbe: failureThreshold: 2 initialDelaySeconds: 60 periodSeconds: 10 successThreshold: 3 timeoutSeconds: 5 restartPolicy: Always revisionHistoryLimit: 10 schedulerName: "" securityContext: fsGroup: 101 fsGroupChangePolicy: OnRootMismatch runAsUser: 101 strategy: type: RollingUpdate terminationGracePeriodSeconds: 300 tolerations: [] topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway updateStrategy: type: RollingUpdate enabled: true fullnameOverride: "" global: {} image: pullPolicy: IfNotPresent repository: docker.redpanda.com/redpandadata/connectors tag: "" imagePullSecrets: [] logging: level: warn monitoring: annotations: {} enabled: false labels: {} namespaceSelector: any: true scrapeInterval: 30s nameOverride: test-name-2 service: annotations: {} name: "" ports: - name: prometheus port: 9404 serviceAccount: annotations: {} create: false name: "" storage: volume: - emptyDir: medium: Memory sizeLimit: 5Mi name: rp-connect-tmp volumeMounts: - mountPath: /tmp name: rp-connect-tmp test: create: false tolerations: [] console: affinity: {} annotations: {} autoscaling: enabled: false maxReplicas: 100 minReplicas: 1 targetCPUUtilizationPercentage: 80 config: {} configmap: create: false console: config: {} deployment: create: false enabled: true enterprise: licenseSecretRef: key: "" name: "" extraContainers: [] extraEnv: [] extraEnvFrom: [] extraVolumeMounts: [] extraVolumes: [] fullnameOverride: "" global: {} image: pullPolicy: IfNotPresent registry: docker.redpanda.com repository: redpandadata/console tag: "" imagePullSecrets: [] ingress: annotations: {} className: "" enabled: false hosts: - host: chart-example.local paths: - path: / pathType: ImplementationSpecific tls: [] initContainers: extraInitContainers: "" livenessProbe: failureThreshold: 3 initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 nameOverride: "" nodeSelector: {} podAnnotations: {} podLabels: {} podSecurityContext: fsGroup: 99 runAsUser: 99 priorityClassName: "" readinessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 replicaCount: 1 resources: {} secret: create: false enterprise: {} kafka: {} login: github: {} google: {} jwtSecret: "" oidc: {} okta: {} redpanda: adminApi: {} secretMounts: [] securityContext: runAsNonRoot: true service: annotations: {} port: 8080 type: ClusterIP serviceAccount: annotations: {} create: true name: "" tolerations: [] topologySpreadConstraints: {} enterprise: license: "" licenseSecretRef: {} external: domain: customredpandadomain.local enabled: true service: enabled: true type: NodePort fullnameOverride: "" image: pullPolicy: IfNotPresent repository: docker.redpanda.com/redpandadata/redpanda tag: "" imagePullSecrets: [] license_key: "" license_secret_ref: {} listeners: admin: external: default: advertisedPorts: - 31644 port: 9645 tls: cert: external port: 9644 tls: cert: default requireClientAuth: true http: authenticationMethod: null enabled: true external: default: advertisedPorts: - 30082 authenticationMethod: null port: 8083 tls: cert: external requireClientAuth: false kafkaEndpoint: default port: 8082 tls: cert: default requireClientAuth: false kafka: authenticationMethod: null external: default: advertisedPorts: - 31092 authenticationMethod: null port: 9094 tls: cert: external port: 9093 tls: cert: default requireClientAuth: false rpc: port: 33145 tls: cert: default requireClientAuth: false schemaRegistry: authenticationMethod: null enabled: true external: default: advertisedPorts: - 30081 authenticationMethod: null port: 8084 tls: cert: external requireClientAuth: false kafkaEndpoint: default port: 8081 tls: cert: default requireClientAuth: false logging: logLevel: info usageStats: enabled: true monitoring: enabled: false labels: {} scrapeInterval: 30s tlsConfig: {} nameOverride: rp-test nodeSelector: {} post_install_job: affinity: {} enabled: true post_upgrade_job: affinity: {} enabled: true rackAwareness: enabled: false nodeAnnotation: topology.kubernetes.io/zone rbac: annotations: {} enabled: false resources: cpu: cores: 1 memory: container: max: 2.5Gi serviceAccount: annotations: {} create: false name: "" statefulset: additionalRedpandaCmdFlags: [] annotations: {} budget: maxUnavailable: 1 extraVolumeMounts: "" extraVolumes: "" initContainerImage: repository: busybox tag: latest initContainers: configurator: extraVolumeMounts: "" resources: {} extraInitContainers: "" setDataDirOwnership: enabled: true extraVolumeMounts: "" resources: {} setTieredStorageCacheDirOwnership: extraVolumeMounts: "" resources: {} tuning: extraVolumeMounts: "" resources: {} livenessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 nodeSelector: {} podAffinity: {} podAntiAffinity: custom: {} topologyKey: kubernetes.io/hostname type: hard weight: 100 priorityClassName: "" readinessProbe: failureThreshold: 3 initialDelaySeconds: 1 periodSeconds: 10 successThreshold: 1 replicas: 3 securityContext: fsGroup: 101 fsGroupChangePolicy: OnRootMismatch runAsUser: 101 sideCars: configWatcher: enabled: true extraVolumeMounts: "" resources: {} securityContext: {} controllers: createRBAC: true enabled: false healthProbeAddress: :8085 image: repository: docker.redpanda.com/redpandadata/redpanda-operator tag: v23.2.8 metricsAddress: :9082 resources: {} run: - all securityContext: {} startupProbe: failureThreshold: 120 initialDelaySeconds: 1 periodSeconds: 10 terminationGracePeriodSeconds: 90 tolerations: [] topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway updateStrategy: type: RollingUpdate storage: hostPath: "" persistentVolume: annotations: {} enabled: true labels: {} size: 20Gi storageClass: "" tiered: config: cloud_storage_access_key: "" cloud_storage_api_endpoint: "" cloud_storage_azure_container: null cloud_storage_azure_shared_key: null cloud_storage_azure_storage_account: null cloud_storage_bucket: "" cloud_storage_cache_size: 5368709120 cloud_storage_credentials_source: config_file cloud_storage_enable_remote_read: true cloud_storage_enable_remote_write: true cloud_storage_enabled: false cloud_storage_region: "" cloud_storage_secret_key: "" hostPath: "" mountType: emptyDir persistentVolume: annotations: {} labels: {} storageClass: "" tls: certs: default: caEnabled: true external: caEnabled: true enabled: true tolerations: [] tuning: tune_aio_events: true ```

Anything else we need to know?

No response

Which are the affected charts?

No response

Chart Version(s)

```console $ helm -n list redpanda-5.6.34 v23.2.13 ```

Cloud provider

kind

JIRA Link: K8S-71

joejulian commented 8 months ago

When changing the configuration, the schema server in redpanda is supposed to restart. It's not, which is causing this issue. (see redpanda issue # )

846 will provide a workaround for this and should resolve this issue.

JakeSCahill commented 6 months ago

Just tested again with redpanda-5.6.63 v23.2.18:

export DOMAIN=customredpandadomain.local && \                           
helm repo add redpanda https://charts.redpanda.com/
helm repo update
helm upgrade --install redpanda redpanda/redpanda \
  --namespace redpanda \
  --create-namespace \
  --set external.domain=${DOMAIN} \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set listeners.admin.tls.requireClientAuth=true --set auth.sasl.enabled=true  --set auth.sasl.secretRef=redpanda-superusers

Console refuses to start up:

kubectl logs redpanda-console-5dd6bdd548-mc5h7 -n redpanda
{"level":"info","ts":"2023-12-18T14:05:07.538Z","msg":"started Redpanda Console","version":"v2.3.8","built_at":"1701900386"}
{"level":"info","ts":"2023-12-18T14:05:07.539Z","msg":"connecting to Kafka seed brokers, trying to fetch cluster metadata"}
{"level":"info","ts":"2023-12-18T14:05:07.549Z","msg":"successfully connected to kafka cluster","advertised_broker_count":3,"topic_count":5,"controller_id":0,"kafka_version":"unknown custom version at least v0.11.0"}
{"level":"info","ts":"2023-12-18T14:05:07.549Z","msg":"creating schema registry client and testing connectivity"}
{"level":"info","ts":"2023-12-18T14:05:07.557Z","msg":"successfully tested schema registry connectivity"}
{"level":"info","ts":"2023-12-18T14:05:07.557Z","msg":"testing admin client connectivity","urls":["https://redpanda.redpanda.svc.cluster.local.:9644"]}
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
{"level":"fatal","ts":"2023-12-18T14:05:10.630Z","msg":"failed to create Redpanda service","error":"failed to test admin client connectivity: Get \"https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers\": remote error: tls: certificate required"}

Secrets available:

kubectl get secret -n redpanda                                          
NAME                                 TYPE                 DATA   AGE
redpanda-client                      kubernetes.io/tls    3      6m29s
redpanda-config-watcher              Opaque               1      6m34s
redpanda-configurator                Opaque               1      6m34s
redpanda-default-cert                kubernetes.io/tls    3      6m29s
redpanda-default-root-certificate    kubernetes.io/tls    3      6m31s
redpanda-external-cert               kubernetes.io/tls    3      6m29s
redpanda-external-root-certificate   kubernetes.io/tls    3      6m31s
redpanda-sts-lifecycle               Opaque               3      6m34s
redpanda-superusers                  Opaque               1      7m33s
sh.helm.release.v1.redpanda.v1       helm.sh/release.v1   1      6m34s