Open Freyert opened 1 year ago
One issue I see with reproduction is that this is largely a timing issue with included random jitter.
So unless you can set a very definite situation, testing this via an integration test will be flakey.
It is better to think about this problem logically: If I have an Age
header, and no one knows about it; can I correctly calculate grace periods and sleep durations?
@peteski22 is there any way we can pick this one back up? You all closed https://github.com/hashicorp/vault/issues/16439 because you couldn't reproduce the issue.
I see that it is very urgent to fix this issue around propagating the Age
header to the LifeTimeWatcher
because without it the Vault Agent Proxy Cache is fundamentally broken for clients using the cache.
Describe the bug
This is a reduction of https://github.com/hashicorp/vault/issues/16439
Because the LifetimeWatcher that handles renews is not aware of
Age
headers it can not correctly calculate grace periods and sleep durations when reading credentials through a caching proxy.Let's look at one of the functions that LifetimeWatcher uses to renew:
api.Sys().Renew()
π When we are looking at this
api.Sys().Renew()
, what I want to look for is:A) Is the
Age
header examined at all? B) Does the return result from the function include theAge
header so downstream consumers who need this information use it.π― The answer in both cases is: no.
To Reproduce Steps to reproduce the behavior:
api.Logical().Read()
api.Sys().Renew()
api.Sys().Renew()
π For any of those
api.Sys().Renew()
calls, are you able to know the value of theAge
header? Not with the current GO API.Expected behavior
api.Sys().Renew()
and other such functions should give access to theAge
header through their return value.Additional context Central Vault Caching Proxies are critical in Kubernetes environments.
While Vault can handle tons of requests, Vault's secret backends can not. If a K8S deployments requests 100 database users from the Atlas Database Plugin using dynamic credentials it will lock the Atlas environment.
Without the
Age
header, new pod instances will not know when to properly renew their secret templates. This happens all the time with autoscaling groups. 30 pods start at time 0, 20 pods at time 20. Credentials expire at time 30 (MAX_TTL). Will the second batch of pods know that the credential has expired? Not until time 50 (20 + MAX_TTL).Once the
Age
header value is accessible by downstream consumers such as LifetimeWatcher, those consumers should be upgraded to acknowledge theAge
header.