kubernetes / cloud-provider-aws

Cloud provider for AWS
https://cloud-provider-aws.sigs.k8s.io/
Apache License 2.0
388 stars 301 forks source link

Ensure that addresses are added in network device index order #909

Closed javanthropus closed 4 months ago

javanthropus commented 5 months ago

What type of PR is this? /kind bug

What this PR does / why we need it: This ensures that the addresses associated with network devices attached to the host are added to a Node resource's address list in order of device index. For some unknown reason AWS returns the list of network devices for just some of our EC2 instances where the primary device is not first in the list. Without this code change, the addresses of the secondary devices are listed first in the addresses for a Node, and this breaks the ability to interact with pods on the node, such as fetching logs and creating port forwards, because the apiserver always uses the first address of the Node resource to reach the kubelet but is unable to reach kubelet on any of these other addresses.

Which issue(s) this PR fixes: Fixes #911.

Special notes for your reviewer: None

Does this PR introduce a user-facing change?:

Addresses associated with the Node resource will be sorted in order of the index of the network device to which they're attached.  In cases where a VPC CNI, such as aws-cni, is used and the order of devices returned for a node is unpredictable, this ensures that Kubernetes uses the right address to interact with the kubelet on the node.
linux-foundation-easycla[bot] commented 5 months ago

CLA Signed

The committers listed above are authorized under a signed CLA.

k8s-ci-robot commented 5 months ago

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 5 months ago

Welcome @javanthropus!

It looks like this is your first PR to kubernetes/cloud-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot commented 5 months ago

Hi @javanthropus. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
cartermckinnon commented 5 months ago

/ok-to-test

javanthropus commented 5 months ago

What's the process for back porting changes? My understanding is that we need to use the 1.27 release branch since our clusters are still running k8s 1.27.

cartermckinnon commented 5 months ago

Ah I didn't notice you had this open against release-1.30. I changed the base to master so you'll need to rebase. After merge you can open cherrypick PR's against older release branches as needed. I use this script to fire them off quickly: https://github.com/kubernetes/kubernetes/blob/master/hack/cherry_pick_pull.sh

example here: https://kops.sigs.k8s.io/contributing/proposing-a-cherry-pick/

javanthropus commented 5 months ago

Ah I didn't notice you had this open against release-1.30. I changed the base to master so you'll need to rebase.

Sorry about that. It wasn't clear what the process is for changes like this. Should I go ahead and squash my commits when I rebase?

cartermckinnon commented 5 months ago

Sure you can go ahead and squash! Thx

cartermckinnon commented 5 months ago

/approve

/assign @mmerkes

PTAL

k8s-ci-robot commented 5 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cartermckinnon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes/cloud-provider-aws/blob/master/OWNERS)~~ [cartermckinnon] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
javanthropus commented 4 months ago

/retest-required

javanthropus commented 4 months ago

The last couple of failed test runs don't appear to be related to my change. @mmerkes or @cartermckinnon, can one of you PTAL?

cartermckinnon commented 4 months ago

@javanthropus sorry about that, the CI is busted. If we can't get a fix in shortly I'll override.

hakman commented 4 months ago

/test pull-cloud-provider-aws-e2e

hakman commented 4 months ago

@javanthropus Please rebase instead of merge. Thanks!

hakman commented 4 months ago

/retest

hakman commented 4 months ago

/test pull-cloud-provider-aws-e2e-kubetest2

cartermckinnon commented 4 months ago

/lgtm

cartermckinnon commented 4 months ago

Thanks for fixing this @javanthropus!