kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.
http://cluster-api-aws.sigs.k8s.io/
Apache License 2.0
646 stars 575 forks source link

AWSCluster can trigger re-enqueue of all machines when ingress rules fail to update, hitting AWS API limits #2794

Open randomvariable opened 3 years ago

randomvariable commented 3 years ago

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.]

In a scenario where CAPA is trying to reconcile security groups and fails, this triggers all machines to be requeued, and then causes API rate limiting against the EC2 API.

What did you expect to happen:

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

We deliberately re-enqueue all machines when a AWSCluster is updated in unpaused state in: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/v0.6.4/controllers/awsmachine_controller.go#L239-L240

We wanted to do that so that we have fast reconciliation, however, it's definitely causing the API rate limit to hit quickly. There's a simple reproduction too:

Create a cluster with CAPA, and then change the description of one of the rules, but keep the rules otherwise the same. Or do the following:

image

Environment:

/priority important-soon /area networking

randomvariable commented 3 years ago

We probably should discuss in next office hours.

shivi28 commented 3 years ago

/assign @shivi28

sedefsavas commented 3 years ago

@shivi28 is there any update on this bug?

shivi28 commented 3 years ago

Hey @sedefsavas, I am going to add debouncing logic and introduce a debouncing window in AWSCluster and AWSMachine reconcilers. Will raise a PR in this week

sedefsavas commented 2 years ago

This issue related to: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1764

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2794#issuecomment-1160374446): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
richardcase commented 2 years ago

/reopen

k8s-ci-robot commented 2 years ago

@richardcase: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2794#issuecomment-1160531573): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2794#issuecomment-1190445087): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Ankitasw commented 1 year ago

/reopen /triage accepted /priority important-soon

k8s-ci-robot commented 1 year ago

@Ankitasw: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2794#issuecomment-1350907952): >/reopen >/triage accepted >/priority important-soon Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-triage-robot commented 1 year ago

The issue has been marked as an important bug and triaged. Such issues are automatically marked as frozen when hitting the rotten state to avoid missing important bugs.

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle frozen

k8s-triage-robot commented 1 year ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

richardcase commented 1 year ago

From office hours 2023-04-03:

/triage accepted /priority important-soon

k8s-triage-robot commented 1 year ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted