Open nberthet opened 3 years ago
Hi,
Maybe remove publishNotReadyAddresses: true
parameters on vault, vault-standby and vault-active services should be enough.
Correct me if i'm wrong but for me, only vault-internal service is used for cluster join, so the vault service should not publish not ready Vault pod.
For vault-standby and vault-active it is less problematic, because the service registration will only tag standby or active pod to put them in the right service so the pod can't be not ready, however the publishNotReadyAddresses: true
parameters seems useless on these services too.
Regards.
Is your feature request related to a problem? Please describe. In one of our deployments, we have a HA cluster with 3 Vault instances. Those instances are using a transit seal.
We recently had an incident; when the vault token used by the seal expired (our bad), one of the vault instance restarted and it remained unsealed due to the expired token. Due to the inclusion of non-ready pods, it immediately resulted in a 30-40% error rate.
Describe the solution you'd like A service should be created by the chart for the use of client applications, so that only "ready" Vault instances would be targeted.
I understand non-ready pods are necessary for cluster join operations, but ideally, a separate service should be used for this purpose and for vault clients.
Describe alternatives you've considered Implementing retries only slightly mitigates this issue due to the high error rate, and the lack of control over the round-robin client load balancing by default.
We ended up defining an additional "service" to exclude non-ready pods.
Additional context While we encountered this issue in the case of an expired token, this could affected any failing restart of a vault instance or even while vault is starting.
We could have lived happily until the working day with 2/3 instances.