uswitch / kiam

Integrate AWS IAM with Kubernetes
Apache License 2.0
1.15k stars 238 forks source link

Reduce Cache Management and Increase Role to Pod Fidelity #106

Open moofish32 opened 6 years ago

moofish32 commented 6 years ago

Currently 3 caches are maintained

  1. Pod Cache (native to k8s client)
  2. Namespace Cache (native to k8s client)
  3. Role cache (golang cache lib)

Instead of separating the role cache can we combine the pod cache to include a Role. This would mean each time a pod is created we insert the Future Role for that specific pod. This would resolve the non-repudiation issue with multiple pods sharing the exact same role. The refactor would require the golang cache lib implementation triggered by the Pod Watcher, but no longer would we have the separate role cache. We might need to consider if it's acceptable to return a Role that is directly connected to a Pod IP/Instance in the event the AWS STS API failed or took too long.

The role session name issue is likely solved at the same time as this: https://github.com/uswitch/kiam/issues/38

pingles commented 6 years ago

It's a good suggestion.

Currently the credentials cache uses an eviction process to ensure that credentials are updated periodically so that the pod can always request valid/fresh credentials. I don't know whether a similar thing would exist within the Kubernetes client cache or we'd have to implement differently?

moofish32 commented 6 years ago

Well I thought you could use the k8s client to manage the existence of the pod and the role session time for the freshness of the credentials. For example, if a pod is added or deleted remove/add the entire cached entry. However, set the eviction time to say 1 hour while generating a role with a 1.5 hour session. Then also set the purge time to 1.5 hours and trigger a retry logic to refresh the role starting at the 1 hour mark (OnEvicted event). Obviously the numbers are picked for discussion and not founded in experimented data. The session would now have something in the session name unique to the pod instance and pods would no longer share credentials.

Alternatively you could keep both caches and just create a role per Pod using something like base64 number for the Pod UID to separate pod instances. The two caches seemed closely related and had the potential to simplify the code base with one less cache. However, the details could prove that assumption false.