Closed jhead-slg closed 3 months ago
A rate-limiting 403 (as opposed to a 429) would create other problems: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/a6d36083511981e576639920930c011273a9eb37/controllers/packetmachine_controller.go#L276-L281
CAPP would consider the machine to be deleted (since a 403 from the Equinix Metal /devices API would indicate a failed provision).
To avoid the rate-limiting scenario, error based reconcile retries should use a calmer approach. The following discuss this pattern for controller-runtime controllers, such as CAPP. Any new parameters should be exposed as CAPP configuration options (as many of the other configuration parameters are today).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/reopen
/remove-lifecycle rotten
@cprivitere: Reopened this issue.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What steps did you take and what happened: Working on removing reserved hardware from a machine deployment I was in the process of deleting machines while keeping replica count the same. It seems at some point I was possibly rate-limited getting back 403 from the API which caused all the machines to now show as
Failed
. I seem to be unable to get the machines back into a happy state as the Packet provider is skipping the check on them.As a side note, there is no way to reduce replica count without other machines being deleted making the removal of specific reserved hardware difficult. This is because the machinedeployment will kill based only on random, oldest, or newest strategies regardless if unprovisioned machines could be removed instead.
What did you expect to happen: I expected to be able to delete the machines out while cluster-api stayed stable at least.
Anything else you would like to add: I assume this happened because we have 100s of reserved hardware ids and it was making API requests for them. Perhaps storing the reservation id in the Status of each packetmachine and checking that first before making an API call out would reduce requests.
Environment:
kubectl version
): 1.22.13/etc/os-release
): Ubuntu 22.04