hashicorp / vault-secrets-operator

The Vault Secrets Operator (VSO) allows Pods to consume Vault secrets natively from Kubernetes Secrets.
https://hashicorp.com
Other
471 stars 102 forks source link

VSO Controller Stopped Renewing Vault Auth Token Unexpectedly #858

Open kdw174 opened 4 months ago

kdw174 commented 4 months ago

Describe the bug Our GCP dynamic credentials were deleted prematurely.

The vault secrets operator controller leader did not renew the vault token used to read gcp dynamic secrets. When the vault token expired, vault revoked the leases for the gcp dynamic credentials and deleted the credentials in gcp. Neither the gcp secret engine lease nor the kubernetes auth vault token should've been near their max ttl.

We run 2 controller pods with direct-encrypted persistent cache and leader election configured.

      --leader-elect
      --client-cache-persistence-model=direct-encrypted
      --client-cache-size=10000

Let me preface this by acknowledging that the long lived ttls are not a recommended approach. We plan to move to shorter ttls.

We confirmed this with vault audit logs. We were able to capture the hashed token showing the when the token was used to read the gcp dynamic secret for the first time and the last time the token was renewed. 1 hour after the last renewal, the vault gcp engine leases created from the kubernetes auth vault token were revoked.

We run the same configuration in a second cluster plus leverage gcp dynamic credentials with the same configuration elsewhere and have not seen this happen anywhere else. As expected, we see the original vault token used to generate the same gcp dynamic secrets in another cluster being renewed to this day.

The vault token and gcp credentials were both 17 days old when the token did not get renewed and the gcp leases were revoked.

Additional context

To Reproduce Steps to reproduce the behavior:

  1. Still trying to reproduce on my end. I will update if/when I'm able to reproduce the issue. Please let us know if there's additional logs, metrics or configuration we should dig into or other things to consider to try to determine what happened here.

Application deployment:

Expected behavior Vault secrets operator controller continues to renew the vault token and does not let the dynamic credentials lease expire before their ttl

Environment

benashz commented 4 months ago

Hi @kdw174 - sorry to hear you encountered some issues with VSO.

We have made a lot improvements to the way the Vault tokens are handled with dynamic secrets, including support for tokens with max TTLs. The bulk of those fixes were in v0.6.0. Would it be possible for you to upgrade to the latest release which is currently: v0.7.1 ?

Thanks,

Ben