Open marshallbrekka opened 7 years ago
As it currently stands, credentials are only requested from STS on an as needed basis.
This is not entirely correct, it will hit STS once every 15 minutes as the credentials are cached thereafter. This seemed like an acceptable tradeoff.
This can be solved with three components
- Fetch credentials as soon as the pod is added
- A goroutine per pod to refresh the credentials before they expire
- Stop the goroutine when the pod is deleted
I don't think this solution will scale to thousands of nodes, potentially hundreds of thousands of pods as:
I would rather increase the caching duration to 60 minutes and/or make it configurable.
I think there is some misunderstanding about the proposed solution, as well as the issue(s) it was meant to solve, I'll try and explain it better below.
This was the primary issue, where if a client failed to find credentials from the metadata service within its default timeout (for some this is 1s, and I have seen STS take longer than that on occasion), it would assume that the metadata service is not available, and move on down the credential provider chain.
This can essentially cause a Pod to fail on startup, and never recover, since the clients cache the credential provider decisions.
As it currently stands, credentials are only requested from STS on an as needed basis.
This is not entirely correct, it will hit STS once every 15 minutes as the credentials are cached thereafter. This seemed like an acceptable tradeoff.
In light of the issue described above, I don't think this tradeoff is acceptable.
This is much less important than the first issue, since it doesn't necessarily cause an outright failure, but for services where latency is important, adding an additional ~100ms on a request is highly undesirable.
I don't think this solution will scale to thousands of nodes, potentially hundreds of thousands of pods as:
- it would be hitting the AWS API limits pretty hard and potentially have knocked down effects on the entire AWS account. This is a show stopper imo.
- there could be hundreds of thousands of goroutines running for kube2iam therefore using a lot of performance even if goroutines are cheap.
I think both of these points are easily addressed by a few ideas.
If you limit the pods that kube2iam is monitoring to the ones that exist on the same worker, then you only need to scale the number of goroutines to the max number of pods that can run on a given worker (this is what the implementation in the PR was doing).
This also means that you should only be making AWS STS API calls 1x per pod per cache duration.
If you wanted to further limit the amount of API calls being made, you could share credentials for all pods on a given worker with the same IAM role, which if your pods were actively using their AWS credentials, would be the exact same as the status quo. However that reduces your ability to trace back API calls to a given pod for audit trail purposes, which I think is undesirable.
In general I don't think the performance of the implementation should be any worse than the status quo with the worst case scenario of pods consistently using credentials.
Thoughts?
We are experiencing timeouts in pods being served by kube2iam. The current blocking approach on first request causes this latency. Since it is triggering timeouts, and our goal for kube2iam is transparency, this makes the current approach unworkable from my perspective. I'm not sure that the proposed solution is ideal, but it seems like a good trade-off to me. Namely, the combination of:
sts:AssumeRole
as a pod is placed on a node)Seems like together that would provide a reasonable balance between observed credential fetch latency, AWS API limits and kube2iam memory/CPU load.
That said, I'm like missing something important ;). What am I missing?
IMHO kube2iam must guarantee to return valid IAM credentials within 1 second as this is the default of most AWS SDKs. I build something similar for the data center (https://github.com/ImmobilienScout24/afp-alppaca) which was also engineered along these lines.
I just finished setting up kube2iam and pods using boto3 restarted 3 times before getting credentials. There is any workaround or news about this issue?
UPDATE: Found a temporary fix for boto, there are 2 env vars AWS_METADATA_SERVICE_TIMEOUT
and AWS_METADATA_SERVICE_NUM_ATTEMPTS
Ran into this problem as well, honestly I can't imagine anyone NOT running into this issue when using kube2iam for anything serious. Would a PR that implements the suggestions made by @marshallbrekka be acceptable?
Yes, a PR would implementing these features would be welcome. There are a few other nuances with how this is done now with the informer which may change the implementation a bit but something that meets the following:
Any update on this? It's been over a year and I'm running into this issue now too. Going to try updating the ENV variables SharpEdgeMarshall mentioned, but would prefer a more permanent fix.
@chrisghill Also facing this issue. Even with the ENV variables setted up is not reliable enough.
@gabrielvcbessa what language are you using? I think those ENV variables only work in boto3 (python). The PHP library doesn't support them (at least the version we use doesn't support them).
As it currently stands, credentials are only requested from STS on an as needed basis. Unfortunately this adds significant latency (I've seen > 100ms) compared to the metadata service (< 1ms).
This can cause some issues for clients (like boto) that have short timeouts for accessing the metadata service when going through the credentials provider chains, causing it to think that there aren't any credentials available.
This can be solved with three components
I've submitted a PR #30 that implements this solution, but I'm happy to discuss other solutions, or modifications to the existing PR.