kubernetes / cloud-provider-aws

Cloud provider for AWS
https://cloud-provider-aws.sigs.k8s.io/
Apache License 2.0
395 stars 302 forks source link

Add/nodeipamcontroller #679

Closed DockToFuture closed 5 months ago

DockToFuture commented 1 year ago

What type of PR is this? /kind feature

What this PR does / why we need it: This change adds a nodeipam controller to the aws cloud-controller-manager for IPv6 prefix delegation. The node ipam controller assigns POD addresses to kubernetes nodes based on prefix delegations. The controller supports plain IPv6 (--dualstack=false) and dual stack (--dualstack=true). In dual stack mode the IPAM for IPv4 is done in the same way it was done in the kube-controller-manager.

Which issue(s) this PR fixes: Fixes https://github.com/kubernetes/cloud-provider-aws/issues/608

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

This change adds a `nodeipam controller` to the aws cloud-controller-manager for IPv6 prefix delegation. The node ipam controller assigns POD addresses to kubernetes nodes based on prefix delegations. The controller supports plain IPv6 (`--dualstack=false`) mode and dual stack (`--dualstack=true`) mode. 
k8s-ci-robot commented 1 year ago

Welcome @DockToFuture!

It looks like this is your first PR to kubernetes/cloud-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot commented 1 year ago

Hi @DockToFuture. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
cartermckinnon commented 1 year ago

/assign @cartermckinnon

I'll give this a look 👍

cartermckinnon commented 1 year ago

/assign @M00nF1sh

hakman commented 1 year ago

/ok-to-test

dims commented 1 year ago

/test all

dims commented 1 year ago

/kind feature /triage accepted

dims commented 1 year ago

@DockToFuture how did you test this? I can't spot any tests in this PR, are parts of this controller here borrowed from other places? could we borrow the tests that exist there as well?

DockToFuture commented 1 year ago

Hello @dims I've deployed the controller into our IPv6 Kubernetes clusters and tested both modes (--dualstack=true,--dualstack=false) with the following controllers enabled (--controllers=cloud-node,cloud-node-lifecycle,nodeipam,service,route). The code is similar to https://github.com/kubernetes/cloud-provider-gcp/blob/master/cmd/cloud-controller-manager/nodeipamcontroller.go . What for tests do you expect? Can you elaborate on it a little bit?

dims commented 1 year ago

@DockToFuture so what i was thinking was if we were to enhance one of the existing harness(es) to enable this control, for example make test-e2e uses kops to setup a cluster and run a bunch of tests

if we take this harness (switch on ipv6!), add a way to ensure that this new controller is activated/enabled that would be the first step. then we could add some basic golang based tests in test/e2e folder to ensure we don't break this controller going forward. You called out The node ipam controller assigns POD addresses to kubernetes nodes based on prefix delegations in your PR body as the feature, so that's what we would test, launch pods and test if the ip addresses look right.

Does that make sense?

dims commented 1 year ago

@DockToFuture also there are some unit tests there in that other repo you pointed to, some of it may be ported over here as well? https://cs.k8s.io/?q=func%20Test&i=nope&files=.*nodeipam.*_test%5C.go&excludeFiles=&repos=kubernetes/cloud-provider-gcp

olemarkus commented 1 year ago

It should be possible to use this as a drop-in replacement to the ipam controller we implemented in kOps. I expect kOps would still handle assigning the actual prefixes to instances.

kOps would not support dualstack though, so it may be an idea to mark that option experimental unless there is another way of getting that e2e-tested.

DockToFuture commented 11 months ago

@DockToFuture also there are some unit tests there in that other repo you pointed to, some of it may be ported over here as well? https://cs.k8s.io/?q=func%20Test&i=nope&files=.*nodeipam.*_test%5C.go&excludeFiles=&repos=kubernetes/cloud-provider-gcp

I've added all relevant unit tests and adapted them where necessary. I've also added some unit tests for the new functionality.

DockToFuture commented 11 months ago

It should be possible to use this as a drop-in replacement to the ipam controller we implemented in kOps. I expect kOps would still handle assigning the actual prefixes to instances.

kOps would not support dualstack though, so it may be an idea to mark that option experimental unless there is another way of getting that e2e-tested.

I've marked the flag as experimental.

DockToFuture commented 11 months ago

@DockToFuture so what i was thinking was if we were to enhance one of the existing harness(es) to enable this control, for example make test-e2e uses kops to setup a cluster and run a bunch of tests

if we take this harness (switch on ipv6!), add a way to ensure that this new controller is activated/enabled that would be the first step. then we could add some basic golang based tests in test/e2e folder to ensure we don't break this controller going forward. You called out The node ipam controller assigns POD addresses to kubernetes nodes based on prefix delegations in your PR body as the feature, so that's what we would test, launch pods and test if the ip addresses look right.

Does that make sense?

What is the best way to do that? Create a cluster with kops and afterwards edit the deployment of the cloud-controller-manager in the e2e test or is there a way to directly set the flags for the cloud-controller-manager in the kops cluster creation call? I didn't find any information how this could be done. Any ideas?

linux-foundation-easycla[bot] commented 10 months ago

CLA Signed

The committers listed above are authorized under a signed CLA.

DockToFuture commented 10 months ago

I've update the PR with an e2e test. A valid domain needs to be provided to run the e2e tests successfully. It's required to get the kops cluster up and running as the used gossip implementation does not support IPv6 yet. It can be set viaDNS_ZONE environment variable. I've used a route53 hosted zone domain for testing.

DockToFuture commented 10 months ago

PR is now ready for review! :)

DockToFuture commented 9 months ago

Any more feedback @M00nF1sh?

dims commented 8 months ago

@DockToFuture please rebase and i'll merge. thanks!

DockToFuture commented 8 months ago

Hello @dims, I've rebased the PR.

dims commented 8 months ago

looks like failure in pull-cloud-provider-aws-e2e is at cleanup phase and the test passed

[38;5;10mSUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 1 Skipped
PASS

Ginkgo ran 1 suite in 1m28.931164668s
Test Suite Passed
dims commented 8 months ago

/skip

dims commented 8 months ago

/approve /lgtm

k8s-ci-robot commented 8 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes/cloud-provider-aws/blob/master/OWNERS)~~ [dims] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
k8s-ci-robot commented 8 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 7 months ago

New changes are detected. LGTM label has been removed.

DockToFuture commented 7 months ago

@dims I've rebased the PR. Can you take a look?

dims commented 7 months ago

/retest

dims commented 7 months ago

@DockToFuture kicked off the tests again hoping these are flakes, if not, then we need to add some option to enable the controller on demand and leave it disabled otherwise i think. (then we can work on a follow up PR to enable them by default)

k8s-ci-robot commented 7 months ago

@DockToFuture: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e 9869f397f7f381ce2cebf15530be89106b3cb82c link true /test pull-cloud-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
DockToFuture commented 7 months ago

@dims a valid dns-zone needs to be set in the environment variable "DNS_ZONE" in order to create the kops cluster successfully, see my earlier comment: https://github.com/kubernetes/cloud-provider-aws/pull/679#issuecomment-1881073904 for more information.

Error: cannot find DNS Zone "example.com".  Please pre-create the zone and set up NS records so that it resolves
Error: exit status 1
make: *** [Makefile:182: test-e2e] Error 1
dims commented 7 months ago

@M00nF1sh any suggestions on how to make this work?

k8s-ci-robot commented 6 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
DockToFuture commented 5 months ago

I will close this PR as we have decided to implement an own IPAM controller as suggested by @M00nF1sh to keep the CloudProvider AWS as slim as possible (analogous to the AWS loadbalancer controller). Furthermore this approach gives us more degrees of freedom how the tests for the IPAM controller are run and executed.

DockToFuture commented 4 months ago

I want to share the freshly created gardener aws-ipam-controller which originated out of this PR and might be beneficial to everyone who might be considering the use of IPv6 prefix delegation for single-stack or dual-stack scenarios.

dims commented 4 months ago

thanks @DockToFuture !