kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.5k stars 1.3k forks source link

Flavored tokens --> flavored nodes #3335

Closed brianthelion closed 2 years ago

brianthelion commented 4 years ago

Preface

I've chosen the term "flavor" here instead of one of the existing idioms -- "label", "taint", "role", "annotation," etc. -- to indicate that I'm agnostic to implementation mechanism. An existing operator and/or an obvious implementation path for this may already exist. If so, a pointer to a relevant doc would be very helpful; it has proven extremely difficult to Google for.

User Story

First, I want to be able to generate multiple "flavors" of bootstrap tokens. For now, let's assume that there are only two, sweet and sour.

Second, I need an operator that can detect whether a joining node has a sweet token or a sour token and map that flavor to the node as well. Node flavor will then be used by downstream logic for differential processing.

Detailed Description

A review of old issues reveals that this has been flirted with quite a bit in the past, but most implementation suggestions called on the kubelet to set its own flavor in one way or another. For my use-case, this is NOT acceptable -- the kubelet cannot be trusted in this regard.

One perfectly admissible solution here would be for the cluster to pass TWO tokens to the node, one standard bootstrap token and a secondary "flavored" token. The flavored token must be secure in nature, though.

Anything else you would like to add:

There are plenty of examples of node label operators in the wild relying on all manner of attributes of the joining node, but none of operator implementations appear to access the original bootstrap token. Apologies if I've missed something.

/kind feature

fabriziopandini commented 4 years ago

TBH I'm not 100% sure I fully understand the used case explained above, so I can add only some background considerations based on my knowledge of kubeadm/CABPK:

More information about alternative option for kubeadm join could be found here

ncdc commented 4 years ago

@brianthelion could you please expand on your user story with some examples of what you'd do with one token vs another?

vincepri commented 4 years ago

/milestone Next /priority awaiting-more-evidence

brianthelion commented 4 years ago

@fabriziopandini @ncdc @vincepri Thank you for the follow-up.

The basis for the use-case is that we're stuck with certain operational constraints, the most challenging of which is that third-parties are administering our nodes. When we hand out bootstrap tokens, we have strong guarantees about who we're handing them out to. Next, we want to limit any given administrator's node's access to workloads on the basis of whether they're trusted (sweet) or not (sour). In the near future, we would like to extend this to include levels of trust. Meanwhile, we do NOT want to reveal to the token-holder the degree to which we trust them.

CAVEAT that I'm basing the following solely on my reading of the docs, issue tracker, and a bit of code: My use-case appears to highlight an important gap in the chain of trust, namely, that the only (strong-sense) identity token in the system is the bootstrap token, but there's no method for associating it (again, strongly) with the node UUID. Unless I'm missing something, this means that true authc and authz on nodes is fundamentally impossible.

fabriziopandini commented 4 years ago

The splint between machine administrator and cluster administrator sounds interesting, but personally I would like this use case being developed into a proposal because IMO there are many open points to be addressed around this use case eg.

ncdc commented 4 years ago

Thanks @brianthelion for the additional info. If it's not too much trouble, I'd like to ask for even more details...

Are you trying to only schedule certain pods to certain nodes, or avoid scheduling certain pods to certain nodes?

What makes one specific node different from any other? How do you distinguish?

You mention chain of trust and node authentication and authorization. What part of you setup needs to authorize a node, and how is it doing that today?

Any more information you can provide would be greatly appreciated. Thanks!

brianthelion commented 4 years ago

@fabriziopandini

What is the expected UX?

Ideally, it would be as simple as kubeadm init --with-flavor={{ flavor }} to produce a token, with no additional changes to the kubeadm join workflow. Flavor would then appear as a node attribute -- "label", "role", whatever; I'm not fussed -- that can be selected for in the deployment step, as well as other operational contexts.

How should we associate a token with a node UUID, given that UUID/Machine names are not known upfront for most of the providers? How this could be implemented in a provider agnostic way?

Naively, it would seem sufficient for node to store its bootstrap token for posterity.

To which extent we should expose the concept of token in Cluster API (afaik these concepts exist only in providers based on kubeadm)?

(Not for me.)

What part of this change impacts core Cluster API, the providers or potentially kubeadm?

(Not for me.)

@ncdc

Are you trying to only schedule certain pods to certain nodes, or avoid scheduling certain pods to certain nodes?

Both. See next.

What makes one specific node different from any other? How do you distinguish?

We use macvlan to interact directly with the node's network environment in situ. We would like to deploy pods based on a superposition of (a) what we detect about the network, (b) the node flavor, and (c) the node administrator's input. The input is given via a web portal.

You mention chain of trust and node authentication and authorization. What part of you setup needs to authorize a node, and how is it doing that today?

See previous. Since our pods are deployed into potentially sensitive network environments, we need to establish strong guarantees about who the node is acting as an agent for. Today, we rely on heuristics about which network domain the node sits in to make an educated guess. This is not secure and it is not going to scale.

Any more information you can provide would be greatly appreciated.

I'm being a little circumspect (read: "cagey") about our use-case because I think its complexity may distract from the basics of the problem, which is actually pretty straightforward. That being that there does not appear to be ANY mechanism -- secure, hacky, or otherwise -- maintaining a record of the node's identity as defined by the initial bootstrap token handover from cluster administrator to node administrator.

ncdc commented 4 years ago

There is a direct link from Machine.status.nodeRef to the node's name. Does that help at all?

brianthelion commented 4 years ago

@ncdc From a UX perspective, it's not unreasonable for the cluster administrator to demand to know the node name before handing a bootstrap token off to the node administrator, but as @fabriziopandini mentioned that's often not just feasible in automated provisioning scenarios.

ncdc commented 4 years ago

@brianthelion if you're able to share, what sort of environment are you operating in - cloud / on prem / VMs / bare metal?

Are you using a custom scheduler?

How do you envision the node being characterized as sweet or sour? Where is this information stored? Who/what sets it (securely)? Who/what consumes a node's flavor?

(I hope you don't mind all the questions - I'm having a hard time seeing the big picture and I'm trying to learn - thanks for your patience!)

brianthelion commented 4 years ago

@ncdc

what sort of environment are you operating in - cloud / on prem / VMs / bare metal?

Our product is a software appliance that runs on-prem in a pre-packaged QEMU/KVM virtual machine. The only things the VM image is missing when it's downloaded by the node administrator -- the "keys to the car" -- are an ssh key and a Kube bootstrap token. The node administrator has to supply both pre-boot via cloud-init. When the VM comes up, systemd forks a kubelet that joins our cluster through the typical bootstrapping mechanisms.

Are you using a custom scheduler?

Not as of now, no.

How do you envision the node being characterized as sweet or sour?

This is a purely out-of-band decision based on business criteria that are opaque to the cluster API. Suffice it to say that there's some oracle somewhere on our side that holds a list of the names of node administrators and whether they should get a sweet or sour node. The node administrator presents domain-specific authentication credentials to us, and we would like to give them back a flavored bootstrap token. But again, we don't want them to be able to tell what flavor the token is by simply sampling lots of tokens.

Where is this information stored?

Prior to bootstrapping the node, it is only stored in our "secret list" as above. After bootstrapping the node, "flavor" should appear as one of the cluster API's available node attributes.

Who/what sets it (securely)?

Unknown; this is what currently appears to be missing from the bootstrapping process.

Who/what consumes a node's flavor?

For us, it would be a nodeSelector.

I hope you don't mind all the questions - I'm having a hard time seeing the big picture and I'm trying to learn - thanks for your patience!

Nope, no problem. It may be that there's an easy answer here that I'm just missing due to lack of Kube knowledge. Q&A will help ferret that out.

ncdc commented 4 years ago

Thanks!

Does the node admin have access to the cluster-api management cluster? Are they creating Machine resources?

So if you're using the nodeSelector to direct pods to flavored nodes, that means each node has to be labeled with the flavor. And you presumably don't want the node administrator to be able to set a node's labels. Which means you can't give the node administrator cluster-admin credentials to the workload cluster... but someone or something has to label the node. I think I now see why you want that secure relationship between a flavored token and the node's name/uid. You could write a controller that can set a node's flavor label based on node name/uid mapped back to a flavored token. Am I on the right track? If so, you still potentially could write a controller that labels the node's flavor by getting Machine.spec.nodeRef in the management cluster, then connecting to the workload cluster and labeling the referenced node.

You also wrote that you want the node admin to provide the Kubernetes bootstrap token to cloud-init for kubeadm to use when joining. Is that a token you expect the node admin to generate?

brianthelion commented 4 years ago

@ncdc

Does the node admin have access to the cluster-api management cluster?

No. More on that at the bottom.

Are they creating Machine resources?

No, nothing fancy in the provisioning process as of yet.

So if you're using the nodeSelector to direct pods to flavored nodes, that means each node has to be labeled with the flavor. And you presumably don't want the node administrator to be able to set a node's labels. Which means you can't give the node administrator cluster-admin credentials to the workload cluster... but someone or something has to label the node.

Yes, you've got it.

I think I now see why you want that secure relationship between a flavored token and the node's name/uid. You could write a controller that can set a node's flavor label based on node name/uid mapped back to a flavored token. Am I on the right track? If so, you still potentially could write a controller that labels the node's flavor by getting Machine.spec.nodeRef in the management cluster, then connecting to the workload cluster and labeling the referenced node.

This seems like a potential workaround, but for two hang-ups: (1) The node's name (or UID) would have to be known at token creation time; (2) the name (or UID) would effectively become node's credential, and it's not exactly a cryptographically secure one. You'd really like something more secure than this.

You also wrote that you want the node admin to provide the Kubernetes bootstrap token to cloud-init for kubeadm to use when joining. Is that a token you expect the node admin to generate?

We have an API call that wraps the kubeadm init step. The node admin presents their domain-specific credential to our API, and the API hands them back the typical join string.

ncdc commented 4 years ago

This seems like a potential workaround, but for two hang-ups: (1) The node's name (or UID) would have to be known at token creation time; (2) the name (or UID) would effectively become node's credential, and it's not exactly a cryptographically secure one. You'd really like something more secure than this.

Maybe the node's name doesn't matter? You have a cluster admin who creates a Machine with a specific flavor, and as soon as Machine.spec.nodeRef is set, your controller labels the node with the flavor. Would that work? (i.e. there is no flavor token)

We have an API call that wraps the kubeadm init step. The node admin presents their domain-specific credential to our API, and the API hands them back the typical join string.

Are you not using the Cluster API kubeadm-based bootstrap provider?

brianthelion commented 4 years ago

@ncdc

Maybe the node's name doesn't matter? You have a cluster admin who creates a Machine with a specific flavor, and as soon as Machine.spec.nodeRef is set, your controller labels the node with the flavor. Would that work? (i.e. there is no flavor token)

Sounds potentially promising. I'll need to learn more about the Machine interface to evaluate.

Are you not using the Cluster API kubeadm-based bootstrap provider?

Our API is just calling kubeadm init in a subprocess and passing the output to the node administrator over https.

ncdc commented 4 years ago

We match a Machine to a Node based on equivalent providerID values between the two. For example, when you create a node in AWS, you create a Machine and an AWSMachine. The AWS provider asks AWS to run an instance and records the instance ID in AWSMachine.spec.providerID. This is then copied to Machine.spec.providerID. As soon as Cluster API is able to connect to the new cluster's apiserver, it starts looking at nodes, trying to find one whose spec.providerID matches the value the Machine has. Once we have this match, we set Machine.spec.nodeRef to the name of the node that matches.

Our API is just calling kubeadm init in a subprocess and passing the output along to the node administrator over https.

It sounds like you're not currently using Cluster API but have another way to set up clusters? Is that accurate?

detiber commented 4 years ago

I'm struggling a bit to understand the use cases where we would want to support Clusters that have mixed workloads across different security boundaries. That seems more like something that would be better handled by federating workloads across multiple clusters or by using something similar to virtual cluster approach that is being investigated by wg-multitenancy.

That said, I think there are several different ways this could be achieved through the existing mechanisms. Namely the Kubeadm config templates related to a given MachineDeployment, especially since I would expect a MachineDeployment to span nodes across security boundaries.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

ncdc commented 3 years ago

/lifecycle frozen

brianthelion commented 3 years ago

@ncdc

I finally got around to implementing your suggesting, above, of using a controller for this. At kubeadm join time we provide a second secure token via a node label on JoinConfiguration.nodeRegistration.kubeletExtraArgs.

When everything is correct, the kubelet joins just fine, and our controller further "flavors" the node by checking our token and then applying node annotations.

However, there are appear to be some issues when the kubelet needs to be rejected. See #4848.

vincepri commented 2 years ago

/close

Closing in favor of https://github.com/kubernetes-sigs/cluster-api/blob/ddf7c499888d16a57324e6e7779ba3fae061ed42/docs/proposals/20210222-kubelet-authentication.md

k8s-ci-robot commented 2 years ago

@vincepri: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/3335#issuecomment-954917616): >/close > >Closing in favor of https://github.com/kubernetes-sigs/cluster-api/blob/ddf7c499888d16a57324e6e7779ba3fae061ed42/docs/proposals/20210222-kubelet-authentication.md > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.