Open erzhan46 opened 1 day ago
This seems to be related to AppRole authentication failures. VSO eventually came up spiking to 2G upon startup and now using1.2G. And it currently logs 'invalid role or secret' errors.
Hi @erzhan46, that level of memory usage is unexpected. Are the AppRole authentication failures expected, and unique to this cluster? How many and what kind of secrets are being synced? Are there other auth methods besides AppRole in use?
Hi @tvoran
We fixed the issue with AppRole authentication - however memory problem still persist.
VSO gets OOMKilled several times upon startup before starting successfully.
Memory metrics show VSO spikes to about 2G and then runs consistently at 1G.
One thing I noticed is the following VSO logs on that cluster.
As you can see - 'Objects listed" error:
{"level":"info","ts":"2024-11-21T16:34:02Z","msg":"Starting EventSource","controller":"secrettransformation","controllerGroup":"secrets.hashicorp.com","controllerKind":"SecretTransformation","source":"kind source: *v1beta1.SecretTransformation"}
{"level":"info","ts":"2024-11-21T16:34:02Z","msg":"Starting Controller","controller":"secrettransformation","controllerGroup":"secrets.hashicorp.com","controllerKind":"SecretTransformation"}
I1121 16:34:35.727589 1 trace.go:236] Trace[1704102856]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232 (21-Nov-2024 16:34:02.275) (total time: 33451ms):
Trace[1704102856]: ---"Objects listed" error:
There is just a few StaticSecrets synced. Couple SecretsTransformations. Cannot use authentication methods other than AppRole because of the issue with private domain name resolution in Vault instances deployed in HCP.
Describe the bug VSO recently started to get OOMKilled on one of the OpenShift clusters (v.4.14.37). Increasing memory limits to 2Gi and trying to put VSO to guaranteed QOS didn't help. There are several other OpenShift clusters where VSO runs just fine with default resource specs.
To Reproduce
Expected behavior VSO should run using default resource specs.
Environment
Additional context This seems to be the same issue experienced by others recently.