Open v0lkan opened 6 months ago
Other notes:
When this happens, VSecM Sentinel fails to execute POST requests because its certificate is invalid.
Killing VSecM Sentinel puts it to a PENDING state too.
Note:
Using an operator is the best option here; whereas using a Pod (or a Job) is the practical option.
We may need to decide what to do, and may be do this in two steps:
What Happens
Especially in test clusters, under resource contention VSecM Pods can fin themselves in a
Pending
situation with the following warning:WARNING: Unable to attach or mount volumes: unmounted volumes=[spire-agent-socket] unattached-volumes=[spire-agent-socket] timed out waiting for the condition
When this happens, deleting the pod via
kubectl delete po
does not help as the Pod remains in a pending state likely due to SPIFFE CSI driver not being able to have been attached in the first place.Manual Workaround
Deleting all
spire-agent
pods, seeing them reconcile, and then deletingspire-server
pod and then seeing it reconcile appears to solve the problem.Once spire reconciles, the pending Pod can easily be deleted and the respawned Pod does not have any connectivity issues.
Problem
We cannot do this manually.
Proposed Solution
The solution is triffold
helm-charts-hardened
project has examples to this.POST
request during init command phase fails its allocate 20 (by default) times; kill the pod and wait for a new pod to spawn.Other things to note: