redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
62 stars 92 forks source link

πŸ”ΉπŸ› Operator default listeners seem to be clashing with user provided ports #1124

Open c4milo opened 3 months ago

c4milo commented 3 months ago

What happened?

I used ports 30081 and 30082 in "external" listeners and the operator complained that the port was already used:

β”‚ manager {"level":"debug","ts":"2024-04-01T21:32:22.924Z","logger":"events","msg":"Helm upgrade failed for release red β”‚β”‚ panda/redpanda with chart redpanda@5.7.36: failed to create resource: Service \"redpanda-external\" is invalid: spec. β”‚β”‚ ports[4].nodePort: Invalid value: 30082: provided port is already allocated\n\nLast Helm logs:\n\n2024-04-01T21:32:22 β”‚β”‚ .501865617Z: Created a new PodDisruptionBudget called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.522146873Z: Cre β”‚β”‚ ated a new ServiceAccount called \"id-rpcloud-9m4e2mr0ui3e8a215n4\" in redpanda\n\n2024-04-01T21:32:22.540660454Z: Cr β”‚β”‚ eated a new Secret called \"redpanda-sts-lifecycle\" in redpanda\n\n2024-04-01T21:32:22.557186641Z: Created a new Sec β”‚β”‚ ret called \"redpanda-config-watcher\" in redpanda\n\n2024-04-01T21:32:22.576038741Z: Created a new Secret called \"r β”‚β”‚ edpanda-configurator\" in redpanda\n\n2024-04-01T21:32:22.592036011Z: Created a new Secret called \"redpanda-fs-valid β”‚β”‚ ator\" in redpanda\n\n2024-04-01T21:32:22.610714711Z: Created a new ConfigMap called \"redpanda\" in redpanda\n\n2024 β”‚β”‚ -04-01T21:32:22.626412621Z: Created a new ConfigMap called \"redpanda-rpk\" in redpanda\n\n2024-04-01T21:32:22.648893 β”‚β”‚ 55Z: Created a new Service called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.778871216Z: warning: Upgrade \"redp β”‚β”‚ anda\" failed: failed to create resource: Service \"redpanda-external\" is invalid: spec.ports[4].nodePort: Invalid v β”‚
β”‚ alue: 30082: provided port is already allocated","type":"Warning","object":{"kind":"HelmRelease","namespace":"redpand β”‚
β”‚ a","name":"redpanda","uid":"9b7006ec-60b7-496b-b21c-0ee3064f8e6d","apiVersion":"helm.toolkit.fluxcd.io/v2beta2","reso β”‚
β”‚ urceVersion":"11709469"},"reason":"UpgradeFailed"}                                                                    β”‚
β”‚

What did you expect to happen?

If I can provide external ports, I expect the operator to honor them. Any hidden magic is highly undesired.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

```console $ helm get values -n --all COMPUTED VALUES: affinity: {} auditLogging: clientMaxBufferSize: 16777216 enabled: false enabledEventTypes: null excludedPrincipals: null excludedTopics: null listener: internal partitions: 12 queueDrainIntervalMs: 500 queueMaxBufferSizePerShard: 1048576 replicationFactor: null auth: sasl: enabled: false mechanism: SCRAM-SHA-512 secretRef: redpanda/redpanda-superusers users: [] clusterDomain: cluster.local commonLabels: {} config: cluster: cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g cloud_storage_azure_storage_account: testcamilo9 cloud_storage_credentials_source: azure_aks_oidc_federation cloud_storage_enable_remote_read: "true" cloud_storage_enable_remote_write: "true" cloud_storage_enabled: "false" default_topic_replications: "3" minimum_topic_replications: "3" node: crash_loop_limit: 5 pandaproxy_client: {} rpk: {} schema_registry_client: {} tunable: compacted_log_segment_size: 67108864 group_topic_partitions: 16 kafka_batch_max_bytes: 1048576 kafka_connection_rate_limit: 1000 log_segment_size: 134217728 log_segment_size_max: 268435456 log_segment_size_min: 16777216 max_compacted_log_segment_size: 536870912 topic_partitions_per_shard: 1000 connectors: deployment: create: false enabled: false test: create: false console: config: {} configmap: create: false deployment: create: false enabled: false secret: create: false enterprise: license: "" licenseSecretRef: key: license name: redpanda-9m4e2mr0ui3e8a215n4g-license external: addresses: - $PREFIX_TEMPLATE domain: camilo.panda.dev enabled: true externalDns: enabled: true prefixTemplate: rp${POD_ORDINAL}-$(echo -n $HOST_IP_ADDRESS | sha256sum | head -c 7) service: enabled: true type: NodePort fullnameOverride: "" image: pullPolicy: IfNotPresent repository: docker.redpanda.com/redpandadata/redpanda tag: v23.3.7 imagePullSecrets: [] license_key: "" license_secret_ref: {} listeners: admin: external: admin-api: advertisedPorts: - 30644 authenticationMethod: sasl enabled: false port: 30644 tls: cert: letsencrypt enabled: true requireClientAuth: false default: advertisedPorts: - 31644 port: 9645 tls: cert: external port: 9644 tls: cert: letsencrypt enabled: true requireClientAuth: false http: authenticationMethod: http_basic enabled: true external: default: advertisedPorts: - 30082 authenticationMethod: null port: 8083 tls: cert: external requireClientAuth: false http-proxy: advertisedPorts: - 30082 authenticationMethod: http_basic enabled: true port: 30082 tls: cert: letsencrypt enabled: true requireClientAuth: false kafkaEndpoint: default port: 8082 prefixTemplate: http-proxy$POD_ORDINAL tls: cert: letsencrypt enabled: true requireClientAuth: false kafka: authenticationMethod: sasl external: default: advertisedPorts: - 31092 authenticationMethod: null port: 9094 tls: cert: external kafka-api: advertisedPorts: - 30092 authenticationMethod: sasl enabled: true port: 30092 tls: cert: letsencrypt requireClientAuth: false port: 9092 prefixTemplate: kafka-api$POD_ORDINAL tls: cert: letsencrypt requireClientAuth: false rpc: port: 33145 tls: cert: letsencrypt requireClientAuth: false schemaRegistry: authenticationMethod: http_basic enabled: true external: default: advertisedPorts: - 30081 authenticationMethod: null port: 8084 tls: cert: external requireClientAuth: false schema-registry: advertisedPorts: - 30081 authenticationMethod: http_basic enabled: true port: 30081 tls: cert: letsencrypt requireClientAuth: false kafkaEndpoint: default port: 8081 tls: cert: letsencrypt requireClientAuth: false logging: logLevel: debug usageStats: clusterId: 9m4e2mr0ui3e8a215n4g enabled: true monitoring: enabled: false labels: {} scrapeInterval: 30s tlsConfig: {} nameOverride: "" nodeSelector: {} post_install_job: affinity: {} enabled: true post_upgrade_job: affinity: {} enabled: true rackAwareness: enabled: true nodeAnnotation: topology.kubernetes.io/zone rbac: annotations: {} enabled: false resources: cpu: cores: "8" memory: container: max: 2Gi min: 2Gi serviceAccount: annotations: azure.workload.identity/client-id: c90db393-857d-41d0-ac0d-0e61271fcaa6 create: true name: id-rpcloud-9m4e2mr0ui3e8a215n4 statefulset: additionalRedpandaCmdFlags: - --abort-on-seastar-bad-alloc - --dump-memory-diagnostics-on-alloc-failure-kind=all annotations: {} budget: maxUnavailable: 1 extraVolumeMounts: "" extraVolumes: "" initContainerImage: repository: busybox tag: latest initContainers: configurator: extraVolumeMounts: "" resources: {} extraInitContainers: "" fsValidator: enabled: true expectedFS: xfs extraVolumeMounts: "" resources: {} setDataDirOwnership: enabled: true extraVolumeMounts: "" resources: {} setTieredStorageCacheDirOwnership: extraVolumeMounts: "" resources: {} tuning: extraVolumeMounts: "" resources: {} livenessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 nodeSelector: cloud.redpanda.com/role: redpanda podAffinity: {} podAntiAffinity: custom: {} topologyKey: kubernetes.io/hostname type: hard weight: 100 priorityClassName: "" readinessProbe: failureThreshold: 3 initialDelaySeconds: 1 periodSeconds: 10 successThreshold: 1 replicas: 3 securityContext: fsGroup: 101 fsGroupChangePolicy: OnRootMismatch runAsUser: 101 sideCars: configWatcher: enabled: true extraVolumeMounts: "" resources: {} securityContext: {} controllers: createRBAC: true enabled: false healthProbeAddress: :8085 image: repository: docker.redpanda.com/redpandadata/redpanda-operator tag: v2.1.10-23.2.18 metricsAddress: :9082 resources: {} run: - all securityContext: {} startupProbe: failureThreshold: 120 initialDelaySeconds: 1 periodSeconds: 10 terminationGracePeriodSeconds: 90 tolerations: - effect: NoSchedule key: cloud.redpanda.com/role operator: Equal value: redpanda topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway updateStrategy: type: RollingUpdate storage: hostPath: "" persistentVolume: annotations: {} enabled: true labels: {} nameOverwrite: "" size: 4096Gi storageClass: local-path tiered: config: cloud_storage_access_key: "" cloud_storage_api_endpoint: "" cloud_storage_azure_container: null cloud_storage_azure_shared_key: null cloud_storage_azure_storage_account: null cloud_storage_bucket: "" cloud_storage_cache_size: 5368709120 cloud_storage_credentials_source: config_file cloud_storage_enable_remote_read: true cloud_storage_enable_remote_write: true cloud_storage_enabled: false cloud_storage_region: "" cloud_storage_secret_key: "" credentialsSecretRef: accessKey: configurationKey: cloud_storage_access_key secretKey: configurationKey: cloud_storage_secret_key hostPath: "" mountType: persistentVolume persistentVolume: annotations: {} labels: {} storageClass: local-path tests: enabled: true tls: certs: default: caEnabled: true external: caEnabled: true letsencrypt: caEnabled: false duration: 43800h0m0s issuerRef: kind: ClusterIssuer name: letsencrypt-dns-prod enabled: true tolerations: [] tuning: tune_aio_events: true ```

Anything else we need to know?

No response

Which are the affected charts?

Operator

Chart Version(s)

```console $ helm -n list NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION redpanda-operator redpanda 2 2024-04-01 16:29:38.92053 -0400 EDT deployed operator-0.4.20 v2.1.15-23.3.7 ```

Cloud provider

Azure

JIRA Link: K8S-129

RafalKorepta commented 3 months ago

It's not operator nor helm-chart responsibility to handle node port conflict.

Please attach kubectl get svc -A -o yaml output to this issue. I wonder if any redpanda helm chart helm release is still in your cluster as left over.

The Redpanda resource spec to solve this issue.

chrisseto commented 3 months ago

I've re-wrapped the error messages from Camilo:

{
  "level": "debug",
  "ts": "2024-04-01T21:32:22.924Z",
  "logger": "events",
  "msg": "Helm upgrade failed for release redpanda/redpanda with chart redpanda@5.7.36: failed to create resource: Service \"redpanda-external\" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated\n\nLast Helm logs:\n\n2024-04-01T21:32:22.501865617Z: Created a new PodDisruptionBudget called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.522146873Z: Created a new ServiceAccount called \"id-rpcloud-9m4e2mr0ui3e8a215n4\" in redpanda\n\n2024-04-01T21:32:22.540660454Z: Created a new Secret called \"redpanda-sts-lifecycle\" in redpanda\n\n2024-04-01T21:32:22.557186641Z: Created a new Secret called \"redpanda-config-watcher\" in redpanda\n\n2024-04-01T21:32:22.576038741Z: Created a new Secret called \"redpanda-configurator\" in redpanda\n\n2024-04-01T21:32:22.592036011Z: Created a new Secret called \"redpanda-fs-validator\" in redpanda\n\n2024-04-01T21:32:22.610714711Z: Created a new ConfigMap called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.626412621Z: Created a new ConfigMap called \"redpanda-rpk\" in redpanda\n\n2024-04-01T21:32:22.64889355Z: Created a new Service called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.778871216Z: warning: Upgrade \"redpanda\" failed: failed to create resource: Service \"redpanda-external\" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated",
  "type": "Warning",
  "object": {
    "kind": "HelmRelease",
    "namespace": "redpanda",
    "name": "redpanda",
    "uid": "9b7006ec-60b7-496b-b21c-0ee3064f8e6d",
    "apiVersion": "helm.toolkit.fluxcd.io/v2beta2",
    "resourceVersion": "11709469"
  },
  "reason": "UpgradeFailed"
}
Helm upgrade failed for release redpanda/redpanda with chart redpanda@5.7.36: failed to create resource: Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated

Last Helm logs:

2024-04-01T21:32:22.501865617Z: Created a new PodDisruptionBudget called "redpanda" in redpanda

2024-04-01T21:32:22.522146873Z: Created a new ServiceAccount called "id-rpcloud-9m4e2mr0ui3e8a215n4" in redpanda

2024-04-01T21:32:22.540660454Z: Created a new Secret called "redpanda-sts-lifecycle" in redpanda

2024-04-01T21:32:22.557186641Z: Created a new Secret called "redpanda-config-watcher" in redpanda

2024-04-01T21:32:22.576038741Z: Created a new Secret called "redpanda-configurator" in redpanda

2024-04-01T21:32:22.592036011Z: Created a new Secret called "redpanda-fs-validator" in redpanda

2024-04-01T21:32:22.610714711Z: Created a new ConfigMap called "redpanda" in redpanda

2024-04-01T21:32:22.626412621Z: Created a new ConfigMap called "redpanda-rpk" in redpanda

2024-04-01T21:32:22.64889355Z: Created a new Service called "redpanda" in redpanda

2024-04-01T21:32:22.778871216Z: warning: Upgrade "redpanda" failed: failed to create resource: Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated
alejandroEsc commented 3 months ago

Interesting, I am seeing the same behavior

helm upgrade --install redpanda charts/redpanda -n redpanda --create-namespace --values 1124.yaml
Release "redpanda" does not exist. Installing it now.
Error: 1 error occurred:
    * Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated

templating this shows the following

# Source: redpanda/templates/services.nodeport.yaml
apiVersion: v1
kind: Service
metadata:
  name: redpanda-external
  namespace: "redpanda"
  labels:
    app.kubernetes.io/component: redpanda
    app.kubernetes.io/instance: redpanda
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redpanda
    helm.sh/chart: redpanda-5.7.37
spec:
  type: NodePort
  publishNotReadyAddresses: true
  externalTrafficPolicy: Local
  sessionAffinity: None
  ports:
    - name: admin-default
      protocol: TCP
      port: 9645
      nodePort: 31644
    - name: kafka-default
      protocol: TCP
      port: 9094
      nodePort: 31092
    - name: kafka-kafka-api
      protocol: TCP
      port: 30092
      nodePort: 30092
    - name: http-default
      protocol: TCP
      port: 8083
      nodePort: 30082
    - name: http-http-proxy
      protocol: TCP
      port: 30082
      nodePort: 30082
    - name: schema-default
      protocol: TCP
      port: 8084
      nodePort: 30081
    - name: schema-schema-registry
      protocol: TCP
      port: 30081
      nodePort: 30081
  selector:
    app.kubernetes.io/name: redpanda
    app.kubernetes.io/instance: "redpanda"
    app.kubernetes.io/component: redpanda-statefulset

I think the problem is that there is two entries with the same nodeport.

when i apply the above file only in a clean installation

k apply -f  a.yaml
The Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated

so i think this is the problem.

alejandroEsc commented 3 months ago

To make this work i made the following changes to your values

  schemaRegistry:
    authenticationMethod: http_basic
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30084

and

  http:
    authenticationMethod: http_basic
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30083
        authenticationMethod: null
        port: 8083
alejandroEsc commented 3 months ago

I believe this is just input error and no "magic" on our end. Perhaps the port list is a bit confusing, its something we have wanted to change for a while now.

chrisseto commented 3 months ago

@alejandroEsc could we add some validation in that case? This feels like a pretty sharp edge.

alejandroEsc commented 3 months ago

Im not sure what we agreed to for this ticket, if the idea is to just shut off service creation to allow for this values file to write out to the internal redpanda.yaml (even though the external is not correct for k8s) then you can proceed by

  # -- Service allows you to manage the creation of an external kubernetes service object
  service:
    # -- Enabled if set to false will not create the external service type
    # You can still set your cluster with external access but not create the supporting service (NodePort/LoadBalander).
    # Set this to false if you rather manage your own service.
    enabled: false

if that is the case then we can close this ticket. Otherwise we can help with documentation. I am not convinced that additional validation would help this situation.

c4milo commented 3 months ago

If you let me disable the operator 's default listeners, I'll be on my way. I don't need them but I need the ports, to keep them aligned with AWS's and GCP's.

alejandroEsc commented 2 months ago

If you let me disable the operator 's default listeners, I'll be on my way. I don't need them but I need the ports, to keep them aligned with AWS's and GCP's.

let's talk and see if we can figure out what you require, im not sure we can disable listeners, never tried.