The controller_resource_status metrics, which contain metrics on the auth and connections status from the VSO to a Vault instance do not get updated.
Metrics for controller="vaultconnection" and controller="vaultauth" experience this issue.
From the observed behavior, the metric only gets updated when going from a failure state to a working state, the other way around doesn't happen.
To Reproduce
Steps to reproduce the behavior:
Deploy the Vault Secrets Operator to the Kubernetes cluster with proper connection configurations in place, like network policies;
Wait for the VSO to start, run the metric related checks and start exposing the metrics;
Validate that the controller_resource_status metrics show a 1 value, meaning that the VSO was able to connect to Vault;
Drop the network policies (or network connectivity) to Vault;
Watch the metrics, and see that they are not being update to a 0 value, meaning that the connection is failing.
Here is a screenshot of a dashboard to visualize the behavior
Application deployment:
There is no application involved in this case, since this is a VSO issue.
Expected behavior
When the connectivity to Vault becomes unavailable, the metrics should be updated to show the actual status.
Environment
Kubernetes version:
Distribution or cloud vendor (OpenShift, EKS, GKE, AKS, etc.): RKE v1.29.6+rke2r1
Other configuration options or runtime services (istio, etc.): Cilium
vault-secrets-operator version: 0.7.1
Additional context
We are moving to the latest available version at this moment (0.8.1), but there are no references on the changelog for metrics or observability that would indicate this being fixed or improved.
Describe the bug
The
controller_resource_status
metrics, which contain metrics on the auth and connections status from the VSO to a Vault instance do not get updated.Metrics for
controller="vaultconnection"
andcontroller="vaultauth"
experience this issue.From the observed behavior, the metric only gets updated when going from a failure state to a working state, the other way around doesn't happen.
To Reproduce
Steps to reproduce the behavior:
controller_resource_status
metrics show a1
value, meaning that the VSO was able to connect to Vault;0
value, meaning that the connection is failing.Here is a screenshot of a dashboard to visualize the behavior
Application deployment:
There is no application involved in this case, since this is a VSO issue.
Expected behavior
When the connectivity to Vault becomes unavailable, the metrics should be updated to show the actual status.
Environment
Additional context
We are moving to the latest available version at this moment (0.8.1), but there are no references on the changelog for metrics or observability that would indicate this being fixed or improved.