redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.53k stars 581 forks source link

Accept configuration that are known to the subset of Redpanda cluster nodes #16024

Open RafalKorepta opened 9 months ago

RafalKorepta commented 9 months ago

Who is this for and what problem do they have today?

In Helm upgrade process (Kubernetes environment) the Redpanda configuration is applied after Statefulset changes it's spec. The Pods rollout is done in parallel to the Kubernetes Job that executes rpk cluster config import new-redpanda-conf.yaml. The problem appears when property only known to new Redpanda version e.g. audit_enabled is hitting cluster that is not fully rollout.

What are the success criteria?

Redpanda that has at least one Node that understand configuration option can accept rpk cluster config import HTTP request.

Why is solving this problem impactful?

Client and community users are seeing helm upgrade failures due to unknown property error returned from Redpanda Admin API when major Redpanda version introduces new field.

PROPERTY                    PRIOR  NEW
audit_enabled               <nil>  false
default_topic_replications  1      3

Validation errors:
 * audit_enabled: Unknown property

No changes were made

Additional notes

https://redpandacommunity.slack.com/archives/C01AJDUT88N/p1704687789956139

Helm references https://github.com/helm/helm/issues/11778 https://github.com/helm/helm/pull/11788

JIRA Link: CORE-1709

michael-redpanda commented 9 months ago

Can we just remove references to the audit system from the helm chart? If we want to provide examples of how to enable it, can we just document it or add it in as a comment?

piyushredpanda commented 9 months ago

@RafalKorepta ^

RafalKorepta commented 9 months ago

Can we just remove references to the audit system from the helm chart? If we want to provide examples of how to enable it, can we just document it or add it in as a comment?

The problem is when user would like to upgrade (using helm chart) from 23.2.x to 23.3.x and use any configuration option that is unknown to 23.2.x and forget to use --wait flag in helm upgrade command, then user might see helm upgrade failure. If one of the Redpanda Pods knows/understand configuration option it could accept that, but not act upon its value.

andijcr commented 9 months ago

not sure if it is useful, but the patch cluster config endpoint accepts a force parameter https://github.com/redpanda-data/redpanda/blob/2c6a872ccff95878624ff01d57452a12dbda7468/src/v/redpanda/admin/server.cc#L1691 to accept a property that would not pass validation. in case of unknown property, these are written to the controller log but ignored (not sure if we test this though). But rpk doesn't expose this parameter at the moment