redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
76 stars 96 forks source link

🔹🐛 Operator is unable to check for cluster readiness over an Admin API listener with TLS enabled. #1127

Closed c4milo closed 2 months ago

c4milo commented 6 months ago

What happened?

In config-watcher container:

│ Waiting for cluster to be ready                                                                                                │
│ unable to request cluster health: Get "http://redpanda-1.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster/health_overview" │
│ Error: (1) occurred at line 12

What did you expect to happen?

rpk cluster health --watch --exit-when-healthy -X admin.tls.enabled=true -X admin.tls.insecure_skip_verify=true

If I can pass admin.tls.insecure_skip_verify also through the CR, it will be fabulous.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

```console COMPUTED VALUES: adminAPIListeners: - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 9644 auth_method: none enabled: true name: admin-api.internal port: 9644 tls_cert: letsencrypt tls_enabled: true tls_require_client_auth: false tls_truststore: "" - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 30644 auth_method: sasl enabled: false name: admin-api port: 30644 tls_cert: letsencrypt tls_enabled: true tls_require_client_auth: false tls_truststore: "" authSASLEnabled: false authSASLMechanism: SCRAM-SHA-512 authSASLSecretRef: redpanda/redpanda-superusers baseDNSName: camilo.panda.dev brokerMemorySizeMiB: 8192 brokerVCPUCount: 3 clusterConfig: cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g cloud_storage_azure_storage_account: testcamilo9 cloud_storage_credentials_source: azure_aks_oidc_federation cloud_storage_enable_remote_read: "true" cloud_storage_enable_remote_write: "true" cloud_storage_enabled: "false" default_topic_replications: "3" minimum_topic_replications: "3" cmdline: - --abort-on-seastar-bad-alloc - --dump-memory-diagnostics-on-alloc-failure-kind=all containerImage: repository: docker.redpanda.com/redpandadata/redpanda tag: v23.3.7 httpProxyListeners: - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 8082 auth_method: http_basic enabled: true name: http-proxy.internal port: 8082 tls_cert: letsencrypt tls_enabled: true tls_require_client_auth: false tls_truststore: "" - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 31082 auth_method: http_basic enabled: true name: http-proxy port: 31082 tls_cert: letsencrypt tls_enabled: true tls_require_client_auth: false tls_truststore: "" internalClusterDomain: cluster.local kafkaAPIListeners: - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 9092 auth_method: sasl enabled: true name: kafka-api.internal port: 9092 tls_cert: letsencrypt tls_require_client_auth: false tls_truststore: "" - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 32092 auth_method: sasl enabled: true name: kafka-api port: 32092 tls_cert: letsencrypt tls_require_client_auth: false tls_truststore: "" licenseSecretRef: key: license name: redpanda-9m4e2mr0ui3e8a215n4g-license logLevel: debug nodeConfig: {} nodeCount: 3 nodeSelector: cloud.redpanda.com/role: redpanda operatorEnabled: true operatorForceHelmUpdate: 1712029669 podLabels: azure.workload.identity/use: "true" rackAwareness: annotation: topology.kubernetes.io/zone enabled: true redpandaClusterID: 9m4e2mr0ui3e8a215n4g rpcListeners: - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 33145 auth_method: none enabled: true name: rpc.internal port: 33145 tls_cert: letsencrypt tls_require_client_auth: false tls_truststore: "" schemaRegistryListeners: - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 8081 auth_method: http_basic enabled: true name: schema-registry.internal port: 8081 tls_cert: letsencrypt tls_require_client_auth: false tls_truststore: "" - address: 0.0.0.0 advertised_address: 127.0.0.1 advertised_port: 31081 auth_method: http_basic enabled: true name: schema-registry port: 31081 tls_cert: letsencrypt tls_require_client_auth: false tls_truststore: "" serviceAccount: annotations: azure.workload.identity/client-id: c90db393-857d-41d0-ac0d-0e61271fcaa6 create: true labels: app.kubernetes.io/managed-by: redpanda name: id-rpcloud-9m4e2mr0ui3e8a215n4 storageClass: local-path storageSizeGiB: 4096 tlsCertificates: - ca_enabled: false duration: 43800h issuer_kind: ClusterIssuer issuer_ref: letsencrypt-dns-prod name: letsencrypt tlsEnabled: true tolerations: - effect: NoSchedule key: cloud.redpanda.com/role operator: Equal value: redpanda tunableConfig: {} ```

Anything else we need to know?

Name:         redpanda
Namespace:    redpanda
Labels:       app.kubernetes.io/component=redpanda
              app.kubernetes.io/instance=redpanda
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=redpanda
              helm.sh/chart=redpanda-5.7.36
              helm.toolkit.fluxcd.io/name=redpanda
              helm.toolkit.fluxcd.io/namespace=redpanda
Annotations:  meta.helm.sh/release-name: redpanda
              meta.helm.sh/release-namespace: redpanda

Data
====
bootstrap.yaml:
----
kafka_enable_authorization: false
enable_sasl: false
enable_rack_awareness: true
cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g
cloud_storage_azure_storage_account: testcamilo9
cloud_storage_credentials_source: azure_aks_oidc_federation
cloud_storage_enable_remote_read: "true"
cloud_storage_enable_remote_write: "true"
cloud_storage_enabled: "false"

default_topic_replications: 3
minimum_topic_replications: "3"

compacted_log_segment_size: 67108864
group_topic_partitions: 16
kafka_batch_max_bytes: 1048576
kafka_connection_rate_limit: 1000
log_segment_size: 134217728
log_segment_size_max: 268435456
log_segment_size_min: 16777216
max_compacted_log_segment_size: 536870912
topic_partitions_per_shard: 1000
storage_min_free_bytes: 5368709120

audit_enabled: false

redpanda.yaml:
----
config_file: /etc/redpanda/redpanda.yaml
cluster_id: 9m4e2mr0ui3e8a215n4g
redpanda:
  empty_seed_starts_cluster: false
  kafka_enable_authorization: false
  enable_sasl: false
  cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g
  cloud_storage_azure_storage_account: testcamilo9
  cloud_storage_credentials_source: azure_aks_oidc_federation
  cloud_storage_enable_remote_read: "true"
  cloud_storage_enable_remote_write: "true"
  cloud_storage_enabled: "false"
  default_topic_replications: "3"
  minimum_topic_replications: "3"
  compacted_log_segment_size: 67108864
  group_topic_partitions: 16
  kafka_batch_max_bytes: 1048576
  kafka_connection_rate_limit: 1000
  log_segment_size: 134217728
  log_segment_size_max: 268435456
  log_segment_size_min: 16777216
  max_compacted_log_segment_size: 536870912
  topic_partitions_per_shard: 1000
  storage_min_free_bytes: 5368709120

  crash_loop_limit: "5"
  audit_enabled: false

  admin:
    - name: internal
      address: 0.0.0.0
      port: 9644
    - name: default
      address: 0.0.0.0
      port: 9645
  admin_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false

      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
  kafka_api:
    - name: internal
      address: 0.0.0.0
      port: 9092
      authentication_method: sasl
    - name: default
      address: 0.0.0.0
      port: 9094
    - name: kafka-api
      address: 0.0.0.0
      port: 32092
      authentication_method: sasl
  kafka_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false

      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: kafka-api
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false

      truststore_file: /etc/ssl/certs/ca-certificates.crt
  rpc_server:
    address: 0.0.0.0
    port: 33145
  rpc_server_tls:
    enabled: true
    cert_file: /etc/tls/certs/letsencrypt/tls.crt
    key_file: /etc/tls/certs/letsencrypt/tls.key
    require_client_auth: false
    truststore_file: /etc/ssl/certs/ca-certificates.crt
  seed_servers: 
    - host:
        address: redpanda-0.redpanda.redpanda.svc.cluster.local.
        port: 33145
    - host:
        address: redpanda-1.redpanda.redpanda.svc.cluster.local.
        port: 33145
    - host:
        address: redpanda-2.redpanda.redpanda.svc.cluster.local.
        port: 33145

schema_registry_client:
  brokers:
  - address: redpanda-0.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-1.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-2.redpanda.redpanda.svc.cluster.local.
    port: 9092
  broker_tls:
    enabled: true
    require_client_auth: false
    cert_file: /etc/tls/certs/letsencrypt/tls.crt
    key_file: /etc/tls/certs/letsencrypt/tls.key
    truststore_file: /etc/ssl/certs/ca-certificates.crt
schema_registry:
  schema_registry_api:
    - name: internal
      address: 0.0.0.0
      port: 8081
      authentication_method: http_basic
    - name: default
      address: 0.0.0.0
      port: 8084
    - name: schema-registry
      address: 0.0.0.0
      port: 31081
      authentication_method: http_basic
  schema_registry_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: schema-registry
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt

pandaproxy_client:
  brokers:
  - address: redpanda-0.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-1.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-2.redpanda.redpanda.svc.cluster.local.
    port: 9092
  broker_tls:
    enabled: true
    require_client_auth: false
    cert_file: /etc/tls/certs/letsencrypt/tls.crt
    key_file: /etc/tls/certs/letsencrypt/tls.key
    truststore_file: /etc/ssl/certs/ca-certificates.crt
pandaproxy:
  pandaproxy_api:
    - name: internal
      address: 0.0.0.0
      port: 8082
      authentication_method: http_basic
    - name: default
      address: 0.0.0.0
      port: 8083
    - name: http-proxy
      address: 0.0.0.0
      port: 31082
      authentication_method: http_basic
  pandaproxy_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: http-proxy
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt

rpk:
  # redpanda server configuration
  overprovisioned: false
  enable_memory_locking: false
  additional_start_flags:
    - "--smp=3"
    - "--memory=5734M"
    - "--reserve-memory=214M"
    - "--default-log-level=debug"
    - --abort-on-seastar-bad-alloc
    - --dump-memory-diagnostics-on-alloc-failure-kind=all
  # rpk tune entries
  tune_aio_events: true

  # kafka connection configuration
  kafka_api:
    brokers: 
      - redpanda-0.redpanda.redpanda.svc.cluster.local.:9092
      - redpanda-1.redpanda.redpanda.svc.cluster.local.:9092
      - redpanda-2.redpanda.redpanda.svc.cluster.local.:9092
    tls:
  admin_api:
    addresses: 
      - redpanda-0.redpanda.redpanda.svc.cluster.local.:9644
      - redpanda-1.redpanda.redpanda.svc.cluster.local.:9644
      - redpanda-2.redpanda.redpanda.svc.cluster.local.:9644
    tls:

BinaryData
====

Events:  <none>

Which are the affected charts?

Redpanda, Operator

Chart Version(s)

```console ❯ helm -n redpanda list NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION redpanda redpanda 6 2024-04-02 01:17:31.253662 -0400 EDT deployed redpanda-0.1.1 0.1.0 redpanda-operator redpanda 1 2024-04-01 18:14:29.732128 -0400 EDT deployed operator-0.4.20 v2.1.15-23.3.7 ```

Cloud provider

Azure / AKS

JIRA Link: K8S-131

c4milo commented 6 months ago

I also opened https://github.com/redpanda-data/redpanda/issues/17540

alejandroEsc commented 6 months ago

@c4milo i think i understand this problem now, do you have a simple sample to test this out? otherwise ill assume just enabling config watcher and tls should be enough right?

alejandroEsc commented 6 months ago

I am not sure what I am missing here, the config-watcher does a good job of getting cluster-health given the setup of rpk when tls is enabled, done some time ago. Now what I am surprised about is this:

redpanda redpanda 6 2024-04-02 01:17:31.253662 -0400 EDT deployed redpanda-0.1.1 0.1.0

we are currently at, looking at the repo:

redpanda/redpanda   5.7.37          v23.3.10        Redpanda is the real-time engine for modern apps.

and my local installation:

redpanda        redpanda        1           2024-04-05 07:33:28.463959 -0400 EDT    deployed    redpanda-5.7.37         v23.3.10

Clearly something is off, can we verify this using the latest charts please? And if we cannot achieve that, is the expectation that we back-port something?

c4milo commented 6 months ago

Does the config map for rpk look like this in your setup?

Screenshot 2024-04-05 at 5 11 59 PM
c4milo commented 6 months ago

why is it trying to use the external domain instead of the internal?

c4milo commented 6 months ago

@alejandroEsc, please let me know if you want to pair on this one.

alejandroEsc commented 6 months ago

@alejandroEsc, please let me know if you want to pair on this one.

yeah, let me know. With the latest changes I am hoping this is resolved?

c4milo commented 6 months ago

This issue is probably a symptom of internal certs using the public dns domain, if we fix that it should also fix this.

Camilo Aguilar

Software Engineer

redpanda.com | The streaming data platform for developers

Follow us on Twitter https://twitter.com/redpandadata | Join our community https://join.slack.com/t/redpandacommunity/shared_invite/zt-ng2ze1uv-l5VMWSGQHB9gp47~kNnYGA/

On Fri, Apr 12, 2024 at 1:59 PM Alejandro Escobar @.***> wrote:

@alejandroEsc https://github.com/alejandroEsc, please let me know if you want to pair on this one.

yeah, let me know. With the latest changes I am hoping this is resolved?

— Reply to this email directly, view it on GitHub https://github.com/redpanda-data/helm-charts/issues/1127#issuecomment-2052222632, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKFUIO4GDQN77EHHZU6FDY5AOHTAVCNFSM6AAAAABFSW7EN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSGIZDENRTGI . You are receiving this because you were mentioned.Message ID: @.***>

c4milo commented 6 months ago

I haven't tested again but I think this issue may have been fixed by https://github.com/redpanda-data/helm-charts/issues/1155 as well.

RafalKorepta commented 2 months ago

We have fixed this issue by configuring RPK correctly. It's done in config map where Redpanda.yaml is located.