redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
77 stars 97 forks source link

Helm chart not properly restarting pods after initial tiered storage settings are applied #1173

Open voutilad opened 7 months ago

voutilad commented 7 months ago

What happened?

With a new deployment, my tiered storage settings are not activated. Looking at the cluster, it's in a state saying it requires restart:

redpanda@redpanda-0:/$ rpk -Xuser=admin -Xpass=xxx config status
NODE  CONFIG-VERSION  NEEDS-RESTART  INVALID  UNKNOWN
0     4               true           []       []
1     4               true           []       []
2     4               true           []       []

What did you expect to happen?

Tiered storage settings should be activated with my settings.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

Deploy Redpanda via Helm using some of the following storage settings:

storage:
  persistentVolume:
    enabled: true
    storageClass: csi-driver-lvm-striped-xfs
  tiered:
    config:
      cloud_storage_enabled: true
      cloud_storage_api_endpoint: storage.googleapis.com
      cloud_storage_credentials_source: gcp_instance_metadata
      cloud_storage_enable_remote_write: true
      cloud_storage_enable_remote_read: true
      cloud_storage_region: my-region
      cloud_storage_bucket: my-bucket

Anything else we need to know?

A workaround is to force a restart via:

kubectl rollout restart statefulset/redpanda 

Which are the affected charts?

Redpanda

Chart Version(s)

5.7.37

Cloud provider

GCP / GKE

JIRA Link: K8S-146

RafalKorepta commented 7 months ago

The root problem is .bootstrap.yaml does not have tiered storage configuration. The post-install hook is configuring tiered storage due to problem with secrets described in https://github.com/redpanda-data/helm-charts/pull/1054.

chrisseto commented 6 months ago

Just some notes for the future:

I think the best option for now will be to use .bootstrap.yaml for the initial configuration and then have a pre-upgrade job that sets the cluster configuration.

We'll need to consolidate the config rendering into a single place in order to mitigate issues like the tiered storage one Rafal is referencing.