kubernetes / cloud-provider-aws

Cloud provider for AWS
https://cloud-provider-aws.sigs.k8s.io/
Apache License 2.0
381 stars 300 forks source link

Fork the tagging controller into generic node customization controller #833

Closed mmerkes closed 6 months ago

mmerkes commented 7 months ago

I intend to begin work on this feature on 2/5/24, and I've discussed in detail with my colleagues, but I wanted to post an issue here for tracking purposes and for feedback from the community.

What would you like to be added?

In 2022, a tagging controller was added to the cloud provider, which is responsible for tagging and untagging node resources as they join and leave the cluster. This controller runs in clusters managed by EKS by default and it can be run elsewhere.

The tagging controller is a very focused controller at this point. While there has been discussions about expanding the resources supported, it's still very limited. However, the usefulness of the tagging controller could be expanded by converting it into a generic node customization controller. It already watches node events, so you could take advantage of the overhead already running and extend that further. There are other feature requests, like #300, that this would simplify and require operators to only run a single controller.

Why is this needed?

There's a feature request to add support for labeling nodes with the AWS zone ID along with the zone name. The zone name is set configured as the node's zone today, and it can create complications for users running in multiple AWS accounts. In the largest AWS regions like us-east-1 and us-west-2, zone names, like us-east-2a, do not always map to the same zone IDs, like use1-az2, across accounts, so if you're trying to assess impact to a particular zone, you name you need normalize the node name across accounts.

There's no hook to provide an additional zone name in existing cloud provider and changing the zone to be a zone ID rather than zone name would be a breaking, backwards incompatible change. Any expectations around the zone name in existing cluster, either around resources deployed there or tooling, will break, and the migration may cause unexpected migration in scheduling, PDBs, etc. As a result, the only practical option is sticking this into a controller, and by utilizing the tagging controller, albeit forking and extending it, will mean that operators only need to operate one controller for both purposes.

We've also heard requests to label nodes with the accelerator family as well, which would be easy to add with this change. I can imagine other labels that could be added in the future that might also be useful.

FAQ

What's being added to the tagging controller?

Support to add additional labels on nodes, specifically zone IDs #300. Label would be something like topology.k8s.aws/zone-id.

Why not just add it to the tagging controller? Right now, the tagging controller is pretty lightweight. We can do the migration with little controversy now to a name that's more appropriate.

Can I just keep using the tagging controller? Yes. The change would done in a way to re-use that code, but keep the existing tagging controller command to avoid breaking clusters unaware of the change. I would add a log to recommend migrating to the new controller, and possibly deprecate the tagging controller some day, but that could be discussed in the future

How would cluster admins migrate from the tagging controller to the new controller? Clusters admins would swap out the tagging controller for the new controller as a direct replacement. The intention is for the operation to be simple and quiet.

Do we need to add any new permissions? CCM should already have all of the permissions needed to support mentioned functionality.

/kind feature

mmerkes commented 7 months ago

/assign mmerkes

mmerkes commented 7 months ago

/triage accepted

mmerkes commented 6 months ago

Resolving this issue in favor of a different solution: https://github.com/kubernetes/kubernetes/pull/123223