Create a Watcher Pod That Checks the System Health

v0lkan commented 6 months ago

What Happens

Especially in test clusters, under resource contention VSecM Pods can fin themselves in a Pending situation with the following warning:

WARNING: Unable to attach or mount volumes: unmounted volumes=[spire-agent-socket] unattached-volumes=[spire-agent-socket] timed out waiting for the condition

When this happens, deleting the pod via kubectl delete po does not help as the Pod remains in a pending state likely due to SPIFFE CSI driver not being able to have been attached in the first place.

Manual Workaround

Deleting all spire-agent pods, seeing them reconcile, and then deleting spire-server pod and then seeing it reconcile appears to solve the problem.

Once spire reconciles, the pending Pod can easily be deleted and the respawned Pod does not have any connectivity issues.

Problem

We cannot do this manually.

Proposed Solution

The solution is triffold

Add adequate annotations to workloads (including pod priority classes) so that SPIFFE CSI Driver works w/o issues under resource contention - helm-charts-hardened project has examples to this.
Remove infinite retry loops. If, for example, a POST request during init command phase fails its allocate 20 (by default) times; kill the pod and wait for a new pod to spawn.
Have an external entity (can be a Pod initially, but ideally a k8s Contoller would be a better fit later down the line) to check the status of VSecM pods, and if any of those are in PENDING state with a timeout warning execute the series of actions outlined in the "manual" step in a controlled manner.
If (3) cannot reconcile for any reason, log the outcome; wait for a grace period and restart.

Other things to note:

VSecM Keystone polls linearly and too frequently; make it do its polling with the exponential backoff algorithn.

v0lkan commented 6 months ago

Other notes:

When this happens, VSecM Sentinel fails to execute POST requests because its certificate is invalid.

Killing VSecM Sentinel puts it to a PENDING state too.

v0lkan commented 6 months ago

Note:

Using an operator is the best option here; whereas using a Pod (or a Job) is the practical option.

We may need to decide what to do, and may be do this in two steps:

create the Pod.
create a follow-up task to convert the pod to an operator.

vmware-tanzu / secrets-manager