lyft / metadataproxy

A proxy for AWS's metadata service that gives out scoped IAM credentials from STS
Other
456 stars 69 forks source link

The role expiration check should be configurable, and 15 minutes at minimum #94

Closed byronwolfman closed 4 years ago

byronwolfman commented 4 years ago

Hey IAM friends. Our org noticed some badly behaving java apps recently. Specifically: java apps would fetch new IAM credentials prior to every single AWS API call for 10 minutes straight, and then after 10 minutes would stop asking. The cause for this seems to be due in part to the official aws-sdk-java library's behaviour when it comes to caching IAM role credentials:

https://github.com/aws/aws-sdk-java/blob/1.11.546/aws-java-sdk-core/src/main/java/com/amazonaws/auth/EC2CredentialsFetcher.java#L49-L53

Specifically: the sdk will cache credentials as long as they're good for at least 15 minutes. If they will expire in 15 minutes, then the sdk asks for new ones.

Metadataproxy also proactively refreshes credentials when they're nearing expiry -- but only 5 minutes ahead instead:

https://github.com/lyft/metadataproxy/blob/1.11.0/metadataproxy/roles.py#L349-L351

This means there's a 10 minute period during which the java sdk asks for new credentials, because it expects to find new ones, but metadataproxy is still answering with the cached credentials.

I'd like to propose that metadataproxy use the same 15 minute threshold for better compatibility with the java sdk, and also provide a new configuration option to make this tuneable.

byronwolfman commented 4 years ago

Funnily enough, I was reading through an incident report from another shop that described similar-but-not-quite-the-same behaviour: https://srvaroa.github.io/kubernetes/migration/latency/dns/java/aws/microservices/2019/10/22/kubernetes-added-a-0-to-my-latency.html

ryan-lane commented 4 years ago

Released as 2.0.0. I bumped the major version, as it changes default behavior and wanted folks to potentially be more aware.