Closed larsks closed 3 weeks ago
There was apparently a more general problem with vault failing to authenticate service accounts from the nerc-ocp-infra cluster; the vault backup jobs were also failing:
$ k -n vault get pod |grep backup
NAME READY STATUS RESTARTS AGE
backup-vault-run-26rrl-pod-4xn2b 0/3 Error 0 42h
backup-vault-run-j2j9k-pod-xdw7n 0/3 Error 0 6h50m
backup-vault-run-lb26j-pod-fvtt4 0/3 Error 0 30h
backup-vault-run-lt2pd-pod-bbzw8 0/3 Error 0 18h
I think something must have happened when I ran the configure-vault
job earlier this week in order to activate new service account token for the hypershift cluster.
I wasn't able to identify a root cause, but the solution was...re-running the configure-vault
job. For the record, that's:
kubectl get job configure-vault -o yaml |
yq '
del(.status)|
del(.metadata.annotations)|
del(.spec.selector)|
del(.spec.template.metadata.labels."controller-uid")
' > job.yaml
kubectl delete job configure-vault
kubectl create -f job.yaml
The ClusterSecretStore is now healthy:
$ k get clustersecretstore
NAME AGE STATUS CAPABILITIES READY
nerc-cluster-secrets 502d Valid ReadWrite True
It looks like the
nerc-cluster-secrets
ClusterSecretStore onnerc-ocp-infra
is offline:This means nerc-ocp-infra won't be getting secret updates and won't be able to retrieve new secrets.