Support AWS Instance profiles

dvianello commented 5 years ago

Hello machine-controller folks,

we're using kubeone to deploy k8s clusters, and we understand it uses machine-controller behind the scenes to create worker nodes. We're struggling a bit in making this work in our AWS setup as we're heavily relying on assuming roles in different accounts, rather having a IAM user that can access directly an underlying account.

Credentials in the environment where we're running kubeone are thus STS short-lived creds that last 8 hours maximum, and not too useful to be injected in machine-controller since it will stop working when the creds expire. We were hoping we could resort to the instance profile - it has enough permissions to create ec2 instances and so on - but editing the secrets out of the machine-controller deployment cause errors like the below:

E1001 10:23:25.552776       1 metrics.go:149] failed to call prov.SetInstanceNumberForMachines: errors: [failed to get EC2 instances: EmptyStaticCreds: static credentials are empty]
I1001 10:23:48.785995       1 migrations.go:147] CRD machines.machine.k8s.io not present, no migration needed
I1001 10:23:48.786014       1 migrations.go:53] Starting to migrate providerConfigs to providerSpecs
I1001 10:23:48.819680       1 migrations.go:135] Successfully migrated providerConfigs to providerSpecs
I1001 10:23:48.819734       1 plugin.go:97] looking for plugin "machine-controller-userdata-centos"
I1001 10:23:48.819761       1 plugin.go:125] checking "/usr/local/bin/machine-controller-userdata-centos"
I1001 10:23:48.819848       1 plugin.go:138] found '/usr/local/bin/machine-controller-userdata-centos'
I1001 10:23:48.819858       1 plugin.go:97] looking for plugin "machine-controller-userdata-coreos"
I1001 10:23:48.819870       1 plugin.go:125] checking "/usr/local/bin/machine-controller-userdata-coreos"
I1001 10:23:48.819889       1 plugin.go:138] found '/usr/local/bin/machine-controller-userdata-coreos'
I1001 10:23:48.819897       1 plugin.go:97] looking for plugin "machine-controller-userdata-ubuntu"
I1001 10:23:48.819908       1 plugin.go:125] checking "/usr/local/bin/machine-controller-userdata-ubuntu"
I1001 10:23:48.819926       1 plugin.go:138] found '/usr/local/bin/machine-controller-userdata-ubuntu'
E1001 10:25:18.192498       1 machine.go:360] Failed to reconcile machine "xxxxx-xxxxx-xxxx-5-xx-xxxx-xx-7d49b65947-6kkg5": failed to get instance from provider: failed to list instances from aws, due to EmptyStaticCreds: static credentials are empty

It feels like this is caused by the fact that https://github.com/kubermatic/machine-controller/blob/d925fa6e6b00fd7f09a7290cd04a10ba8928838e/pkg/cloudprovider/provider/aws/provider.go#L326 goes for static credentials directly, instead of using a credentials chain via ChainProvider (https://docs.aws.amazon.com/sdk-for-go/api/aws/credentials/#ChainProvider) that could fall back to the instance profile.

Do you have any plans of supporting instance profiles? It would simplify a lot credentials management when dealing with clusters in AWS!

Happy to help if we can.

Best, Dario

alvaroaleman commented 5 years ago

hey @dvianello , while we ourselves do not need this I am not opposed to adding it. Would you be open to provide a PR and validate it from your side?

Only issue is that I am not sure how we could write a test for this, so there is a chance ppl my inadvertently break it in the future.

kubermatic-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kron4eg commented 4 years ago

Alternative solution can be integrating vault-injector, point it to the AWS secrets path, and make it re-request credentials. The only need that machine-controller should do in such use-case is to be able to read credentials from the file (and re-read them on change).

kron4eg commented 4 years ago

/remove-lifecycle stale

dvianello commented 4 years ago

@kron4eg, would this add an external dependency to the process, i.e. Vault?

Instance profiles "just work" in AWS, and there's built-in code for using them in the various AWS SDKs.

kron4eg commented 4 years ago

While KubeOne's example terraform config features quite open AWS permissions for instance profile for control-plane nodes, that's no a good way for real production setups. Real production setups should lock down those permissions. Relying on instance profile credentials on a VM with multiple shared workloads is kinda vulnerable way of doing AWS business. Any accidental/malicious workloads that will endup on control-plane Nodes would receive same privileges as control-plane itself. That's one of the main reasons why projects such as kube2iam exists (to prevent giving out instance profiles to whatever pod + to attach different IAM profiles to different pods).

dvianello commented 4 years ago

@kron4eg, fair point.

Is there any way machine-controller could be forced to use kube2iam then? My understanding is that it should actually be transparent, with kube2iam intercepting calls to the metadata IP coming from pods and replying with STS creds if authorised to do so.

The above brings us back to the point that building support for Instance Profiles might just well work: it will be down to users to either rely on the instance profile directly - with the security downsides you mentioned above- or deploy kube2iam and use that to provide creds to machine-controller.

Or am I missing something?

kron4eg commented 4 years ago

@dvianello the problem with kube2iam (from machine-controller perspective), is that we can't differentiate between instance profile and kube2iam. The implicit nature of those credentials makes me worry.

Up until now, we were explicit about credentials used by the machine-controller and it should stay this way.

Besides, kube2iam is also an external dependency. So if we'd need to choose between those two I'd choose vault every time. Vault can communicate with AWS API, and request new shortlived credentials, and vault-agent will renew them on a shared with machine-controller volume.

P.S. You can already "fake" usage of instance profile in machine-controller deployment with an init + sidecar container, that will grab STS credentials before machine-controller starts and launch in with new ENV vars containing STS creds. Of course it comes with a downside that next kubeone invocation will override it. Vault injector on the other side can "inject" whatever needed without any change from kubeone (which creates machine-controller deployment). The only thing we need to do is to "teach" machine-controller to read credentials from the file provided by the injector, and maybe to annotate machine-controller deployment with injector instructions.

dvianello commented 4 years ago

Hey,

@dvianello the problem with kube2iam (from machine-controller perspective), is that we can't differentiate between instance profile and kube2iam. The implicit nature of those credentials makes me worry. Up until now, we were explicit about credentials used by the machine-controller and it should stay this way.

I understand that the process would become a little bit less transparent - but I can assure you that from our perspective it was quite non-obvious that the current process was grabbing user's credentials and injecting them behind the scenes into the machines. But again, this might be a bit more a problem with kubeone and the way they deal with it.

Besides, kube2iam is also an external dependency. So if we'd need to choose between those two I'd choose vault every time. Vault can communicate with AWS API, and request new shortlived credentials, and vault-agent will renew them on a shared with machine-controller volume.

Agreed kube2iam would be an external dependency, but it would IMHO be a bit less than an entire Vault setup.

P.S. You can already "fake" usage of instance profile in machine-controller deployment with an init + sidecar container, that will grab STS credentials before machine-controller starts and launch in with new ENV vars containing STS creds.

Not sure the above would work, as I understand machine-controller would be a long-lived service, so the STS creds initially grabbed would expire after a set amount of time - max 12 hours I believe.

Anyway, understand why you're worried about changing all of this, don't get me wrong - I just believe that, from an AWS usability point of view, support for some sort of almost-native AWS credentials delivery system would be nice. For EKS there's more happening behind the scenes for this, see https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-eks-adds-support-to-assign-iam-permissions-to-kubernetes-service-accounts/. So there might be a time where no external dependencies will be needed.

HTH!

Thanks, Dario

kron4eg commented 4 years ago

Not sure the above would work, as I understand machine-controller would be a long-lived service, so the STS creds initially grabbed would expire after a set amount of time - max 12 hours I believe.

if sidecar would quite (say after 12 hours), whole pod will be restarted and then the process will reiterate.

kubermatic-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

cpuspellcaster commented 4 years ago

This is a concern for us as well; we would like to utilize the AWS Credentials Chain and relying instead on a static credential pair is not a workable solution for us. Our security posture does not allow usage of static credentials. Using a sidecar container to refresh the credentials is not desirable either, because the pod restart metric will increase linearly over 12h cycles and obfuscate our observability infrastructure from using that metrics as a failure heuristic. Setting up Vault to cover this use case is not a feasible solution for us. We've invested in using Pod Identity Webhook and Mutating Admission controller to scope down IAM policy permission to the pod level. Requiring usage of the static credentials here blocks the AWS Credentials Chain from picking up a Pod Identity. At a minimum, however, we would still prefer to give elevated permission to the whole node via the IAM Instance Profile for the Control Plane EC2 instances, since the machine-controller runs on the control plane nodes.

xmudrii commented 4 years ago

/remove-lifecycle rotten /kind feature

cpuspellcaster commented 2 years ago

Just wanted to provide an update that the lack of this feature continues to be an issue for our organization.

kubermatic / machine-controller

Support AWS Instance profiles #637