Closed byronwolfman closed 4 years ago
Funnily enough, I was reading through an incident report from another shop that described similar-but-not-quite-the-same behaviour: https://srvaroa.github.io/kubernetes/migration/latency/dns/java/aws/microservices/2019/10/22/kubernetes-added-a-0-to-my-latency.html
Released as 2.0.0. I bumped the major version, as it changes default behavior and wanted folks to potentially be more aware.
Hey IAM friends. Our org noticed some badly behaving java apps recently. Specifically: java apps would fetch new IAM credentials prior to every single AWS API call for 10 minutes straight, and then after 10 minutes would stop asking. The cause for this seems to be due in part to the official aws-sdk-java library's behaviour when it comes to caching IAM role credentials:
https://github.com/aws/aws-sdk-java/blob/1.11.546/aws-java-sdk-core/src/main/java/com/amazonaws/auth/EC2CredentialsFetcher.java#L49-L53
Specifically: the sdk will cache credentials as long as they're good for at least 15 minutes. If they will expire in 15 minutes, then the sdk asks for new ones.
Metadataproxy also proactively refreshes credentials when they're nearing expiry -- but only 5 minutes ahead instead:
https://github.com/lyft/metadataproxy/blob/1.11.0/metadataproxy/roles.py#L349-L351
This means there's a 10 minute period during which the java sdk asks for new credentials, because it expects to find new ones, but metadataproxy is still answering with the cached credentials.
I'd like to propose that metadataproxy use the same 15 minute threshold for better compatibility with the java sdk, and also provide a new configuration option to make this tuneable.