stellar / helm-charts

Helm charts for deploying SDF maintained software
3 stars 12 forks source link

Update RPC chart to use additional zero downtime mechanisms #84

Closed sreuland closed 5 months ago

sreuland commented 5 months ago

Update the rpc chart to include an updates of the statefulset that assist for better zero downtime of rpc at runtime and upgrade time:

These changes have been verified on cluster deployments using the same changes directly through k8s manifests: https://github.com/stellar/kube/pull/2098

Closes #82

2opremio commented 5 months ago

I would parameterize the readiness probe with respect to having the history window full. When you start rpc for the first time (and for tests) it’s likely that you don’t want to wait for a day. In fact I would make it optional.

jacekn commented 5 months ago

I would parameterize the readiness probe with respect to having the history window full. When you start rpc for the first time (and for tests) it’s likely that you don’t want to wait for a day. In fact I would make it optional.

We agreed to improve the way health status is exposed by soroban-rpc so I think it would be best to do that before we add any options to the chart. This way we avoid potential option renames and extra operational overhead for operators. Rewiring the probe internally in the chart is transparent, most operators don't need to know about it, so I think this is OK. But once you start adding options you have to start thinking about lifecycle for those and this will add overhead so I think it should only be done after we make changes to soroban-rpc

2opremio commented 5 months ago

See https://github.com/stellar/soroban-rpc/pull/146

sreuland commented 5 months ago

good points on both sides, https://github.com/stellar/soroban-rpc/pull/146 seems like may be too soon, can we agree to wait for the updated rpc health status feature spec to be completed first? I've spun up feature ticket here - https://github.com/stellar/soroban-rpc/issues/148, please review/comment.

and can this pr proceed forward with chart as-is, per @jacekn 's feedback on avoiding config churn for operators? otherwise, adding the external value config param per @2opremio 's suggestion right here sounds like would be a good safety valve.