Open dvianello opened 5 years ago
hey @dvianello , while we ourselves do not need this I am not opposed to adding it. Would you be open to provide a PR and validate it from your side?
Only issue is that I am not sure how we could write a test for this, so there is a chance ppl my inadvertently break it in the future.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Alternative solution can be integrating vault-injector, point it to the AWS secrets path, and make it re-request credentials. The only need that machine-controller should do in such use-case is to be able to read credentials from the file (and re-read them on change).
/remove-lifecycle stale
@kron4eg, would this add an external dependency to the process, i.e. Vault?
Instance profiles "just work" in AWS, and there's built-in code for using them in the various AWS SDKs.
While KubeOne's example terraform config features quite open AWS permissions for instance profile for control-plane nodes, that's no a good way for real production setups. Real production setups should lock down those permissions. Relying on instance profile credentials on a VM with multiple shared workloads is kinda vulnerable way of doing AWS business. Any accidental/malicious workloads that will endup on control-plane Nodes would receive same privileges as control-plane itself. That's one of the main reasons why projects such as kube2iam exists (to prevent giving out instance profiles to whatever pod + to attach different IAM profiles to different pods).
@kron4eg, fair point.
Is there any way machine-controller
could be forced to use kube2iam
then? My understanding is that it should actually be transparent, with kube2iam
intercepting calls to the metadata IP coming from pods and replying with STS creds if authorised to do so.
The above brings us back to the point that building support for Instance Profiles might just well work: it will be down to users to either rely on the instance profile directly - with the security downsides you mentioned above- or deploy kube2iam and use that to provide creds to machine-controller
.
Or am I missing something?
@dvianello the problem with kube2iam (from machine-controller perspective), is that we can't differentiate between instance profile and kube2iam. The implicit nature of those credentials makes me worry.
Up until now, we were explicit about credentials used by the machine-controller and it should stay this way.
Besides, kube2iam is also an external dependency. So if we'd need to choose between those two I'd choose vault every time. Vault can communicate with AWS API, and request new shortlived credentials, and vault-agent will renew them on a shared with machine-controller volume.
P.S. You can already "fake" usage of instance profile in machine-controller deployment with an init + sidecar container, that will grab STS credentials before machine-controller starts and launch in with new ENV vars containing STS creds. Of course it comes with a downside that next kubeone invocation will override it. Vault injector on the other side can "inject" whatever needed without any change from kubeone (which creates machine-controller deployment). The only thing we need to do is to "teach" machine-controller to read credentials from the file provided by the injector, and maybe to annotate machine-controller deployment with injector instructions.
Hey,
@dvianello the problem with kube2iam (from machine-controller perspective), is that we can't differentiate between instance profile and kube2iam. The implicit nature of those credentials makes me worry. Up until now, we were explicit about credentials used by the machine-controller and it should stay this way.
I understand that the process would become a little bit less transparent - but I can assure you that from our perspective it was quite non-obvious that the current process was grabbing user's credentials and injecting them behind the scenes into the machines. But again, this might be a bit more a problem with kubeone
and the way they deal with it.
Besides, kube2iam is also an external dependency. So if we'd need to choose between those two I'd choose vault every time. Vault can communicate with AWS API, and request new shortlived credentials, and vault-agent will renew them on a shared with machine-controller volume.
Agreed kube2iam
would be an external dependency, but it would IMHO be a bit less than an entire Vault setup.
P.S. You can already "fake" usage of instance profile in machine-controller deployment with an init + sidecar container, that will grab STS credentials before machine-controller starts and launch in with new ENV vars containing STS creds.
Not sure the above would work, as I understand machine-controller
would be a long-lived service, so the STS creds initially grabbed would expire after a set amount of time - max 12 hours I believe.
Anyway, understand why you're worried about changing all of this, don't get me wrong - I just believe that, from an AWS usability point of view, support for some sort of almost-native AWS credentials delivery system would be nice. For EKS there's more happening behind the scenes for this, see https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-eks-adds-support-to-assign-iam-permissions-to-kubernetes-service-accounts/. So there might be a time where no external dependencies will be needed.
HTH!
Thanks, Dario
Not sure the above would work, as I understand
machine-controller
would be a long-lived service, so the STS creds initially grabbed would expire after a set amount of time - max 12 hours I believe.
if sidecar would quite (say after 12 hours), whole pod will be restarted and then the process will reiterate.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
This is a concern for us as well; we would like to utilize the AWS Credentials Chain and relying instead on a static credential pair is not a workable solution for us. Our security posture does not allow usage of static credentials. Using a sidecar container to refresh the credentials is not desirable either, because the pod restart metric will increase linearly over 12h cycles and obfuscate our observability infrastructure from using that metrics as a failure heuristic. Setting up Vault to cover this use case is not a feasible solution for us. We've invested in using Pod Identity Webhook and Mutating Admission controller to scope down IAM policy permission to the pod level. Requiring usage of the static credentials here blocks the AWS Credentials Chain from picking up a Pod Identity. At a minimum, however, we would still prefer to give elevated permission to the whole node via the IAM Instance Profile for the Control Plane EC2 instances, since the machine-controller runs on the control plane nodes.
/remove-lifecycle rotten /kind feature
Just wanted to provide an update that the lack of this feature continues to be an issue for our organization.
Hello machine-controller folks,
we're using
kubeone
to deploy k8s clusters, and we understand it usesmachine-controller
behind the scenes to create worker nodes. We're struggling a bit in making this work in our AWS setup as we're heavily relying on assuming roles in different accounts, rather having a IAM user that can access directly an underlying account.Credentials in the environment where we're running
kubeone
are thus STS short-lived creds that last 8 hours maximum, and not too useful to be injected inmachine-controller
since it will stop working when the creds expire. We were hoping we could resort to the instance profile - it has enough permissions to create ec2 instances and so on - but editing the secrets out of themachine-controller
deployment cause errors like the below:It feels like this is caused by the fact that https://github.com/kubermatic/machine-controller/blob/d925fa6e6b00fd7f09a7290cd04a10ba8928838e/pkg/cloudprovider/provider/aws/provider.go#L326 goes for static credentials directly, instead of using a credentials chain via
ChainProvider
(https://docs.aws.amazon.com/sdk-for-go/api/aws/credentials/#ChainProvider) that could fall back to the instance profile.Do you have any plans of supporting instance profiles? It would simplify a lot credentials management when dealing with clusters in AWS!
Happy to help if we can.
Best, Dario