hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.27k stars 4.23k forks source link

api.LifeTimeWatcher is not aware of `Age` headers #19227

Open Freyert opened 1 year ago

Freyert commented 1 year ago

Describe the bug

This is a reduction of https://github.com/hashicorp/vault/issues/16439

Because the LifetimeWatcher that handles renews is not aware of Age headers it can not correctly calculate grace periods and sleep durations when reading credentials through a caching proxy.

Let's look at one of the functions that LifetimeWatcher uses to renew: api.Sys().Renew()

πŸ‘‰ When we are looking at this api.Sys().Renew(), what I want to look for is:

A) Is the Age header examined at all? B) Does the return result from the function include the Age header so downstream consumers who need this information use it.

🎯 The answer in both cases is: no.

To Reproduce Steps to reproduce the behavior:

  1. Set up a Vault Server
  2. Set up a caching proxy
  3. Read a secret into the proxy using api.Logical().Read()
  4. Call Vault through the caching proxy using api.Sys().Renew()
  5. Wait
  6. Call Vault through the caching proxy using api.Sys().Renew()

πŸ” For any of those api.Sys().Renew() calls, are you able to know the value of the Age header? Not with the current GO API.

Expected behavior

api.Sys().Renew() and other such functions should give access to the Age header through their return value.

Additional context Central Vault Caching Proxies are critical in Kubernetes environments.

While Vault can handle tons of requests, Vault's secret backends can not. If a K8S deployments requests 100 database users from the Atlas Database Plugin using dynamic credentials it will lock the Atlas environment.

Without the Age header, new pod instances will not know when to properly renew their secret templates. This happens all the time with autoscaling groups. 30 pods start at time 0, 20 pods at time 20. Credentials expire at time 30 (MAX_TTL). Will the second batch of pods know that the credential has expired? Not until time 50 (20 + MAX_TTL).

Once the Age header value is accessible by downstream consumers such as LifetimeWatcher, those consumers should be upgraded to acknowledge the Age header.

Freyert commented 1 year ago

One issue I see with reproduction is that this is largely a timing issue with included random jitter.

So unless you can set a very definite situation, testing this via an integration test will be flakey.

It is better to think about this problem logically: If I have an Age header, and no one knows about it; can I correctly calculate grace periods and sleep durations?

Freyert commented 1 year ago

@peteski22 is there any way we can pick this one back up? You all closed https://github.com/hashicorp/vault/issues/16439 because you couldn't reproduce the issue.

I see that it is very urgent to fix this issue around propagating the Age header to the LifeTimeWatcher because without it the Vault Agent Proxy Cache is fundamentally broken for clients using the cache.