hashicorp / vault-k8s

First-class support for Vault and Kubernetes.
Mozilla Public License 2.0
786 stars 169 forks source link

Monitoring and reconciliation #40

Open fischerman opened 4 years ago

fischerman commented 4 years ago

The injector is important for Pods with annotations vault.hashicorp.com/.... To make the injector a less critical component for the cluster, the FailurePolicy for the webhook should be set to Ignore (which is the case in the Helm deployment).

If the injector is unavailable, pods which need the agent will be created but probably fail to run properly. Liveness and readiness probes will not help in this case -- they do not recreate pods. Without looking closely at the resulting pod spec, the only indication for the cause is a log line from the api-server. Metrics are only available for webhooks with a FailurePolicy set to Fail.

This issue is to discuss approaches to monitor and/or re-conciliate unavailabilities of the webhook. Here is one approach:

I'm sure there are many other approaches. Would be interested to hear them!

fischerman commented 4 years ago

Another approach would be to use object selectors. The failure policy could be set to Failure but only prevents pods which need the agent from being creating during a failure. We could even differentiate between optional and non-optional injections.

This would require K8s 1.15 (maybe as an opt-in switch) and at least one label (instead of an annotation). On the upside, no code changes are required.