kubernetes / cloud-provider-aws

Cloud provider for AWS
https://cloud-provider-aws.sigs.k8s.io/
Apache License 2.0
395 stars 302 forks source link

Do not allow EC2 instance ID NotFound to succeed tagging #674

Closed ndbaker1 closed 1 year ago

ndbaker1 commented 1 year ago

What type of PR is this?

/kind bug

What this PR does / why we need it:

Removes the graceful handling of InvalidInstanceID.NotFound error when attempting to tag an ec2 instance that has not fully come up. This has caused an issue where we've seen the tagging controller misleadingly exit successfully, not actually tagging the instance, and does not re-queue the item to (ideally) execute again once the instance becomes visible.

example log feed:

tags.go:326] Couldn't find resource when trying to tag it hence skipping it, InvalidInstanceID.NotFound: The instance ID 'i-***' does not exist status code: 400, request id: ***
tagging_controller.go:299] Successfully tagged i-*** with map[aws:eks:cluster-name:***]. Labeling the nodes with tagging controller labels now.
tagging_controller.go:305] Successfully labeled node ip-***.compute.internal with map[k8s.io/cloud-provider-aws:***].

This behavior does satisfy the untag action, since removing the tag from a non-existing instance is a no-op, so no changes need to be made there.

Its worth mentioning the initial PR to gracefully handle this (https://github.com/kubernetes/cloud-provider-aws/pull/448) aimed to fix all cases discussed in issue https://github.com/kubernetes/cloud-provider-aws/issues/444 where the untracked InvalidInstanceID.NotFound errors were valid failure modes in the context of instance termination.

Which issue(s) this PR fixes: N/A

Special notes for your reviewer: N/A

Does this PR introduce a user-facing change?:

NONE
linux-foundation-easycla[bot] commented 1 year ago

CLA Signed

The committers listed above are authorized under a signed CLA.

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 1 year ago

Welcome @ndbaker1!

It looks like this is your first PR to kubernetes/cloud-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot commented 1 year ago

Hi @ndbaker1. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
cartermckinnon commented 1 year ago

/ok-to-test

cartermckinnon commented 1 year ago

Change looks fine to me. IIUC, silencing this error in #448 was just an optimization; if there is an errant Node in the API, we'll try to tag it n times, but there's no correctness issue per se?

This will show up in our error metrics, but I think that's appropriate.

cartermckinnon commented 1 year ago

/retest

hakman commented 1 year ago

/release-note-none

k8s-ci-robot commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes/cloud-provider-aws/blob/master/OWNERS)~~ [hakman] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment