Open koba1t opened 10 months ago
This issue is currently awaiting triage.
If CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/cc @musaprg
Sometimes, any node will become unstable
Can you specify better what does this means / how this condition shows up? I'm asking this because if MHC can automatically detect this condition, then users can benefit from everything the remediation already supports, e.g maxUnhealthy budget.
We can recreate one machine by removing its machine resource, but that operation temporarily reduces the total computing capacity of the entire cluster. ... Add a way to add one machine before actually terminating the machine.
This is interesting, it will be a nice remediation strategy to have for MachineDeployment (and probably for MachinePools as well). However, might be that this is sort of tricky due to how MachineDeployment and MachineSet works, but I did not check the code.
Let's also see if someone else is interested in this idea.
Can you specify better what does this means / how this condition shows up? I'm asking this because if MHC can automatically detect this condition, then users can benefit from everything the remediation already supports, e.g maxUnhealthy budget.
In our scenario, the node appears healthy, but underlying issues affect the application, such as network latencies and performance problems. We operate clusters on OpenStack using on-prem hypervisors. Occasionally, the problem may be attributed to a virtual machine or hypervisor issue. In such cases, the cluster node metrics indicate healthiness, prompting the cluster operator to manually restart the node to resolve the underlying problem within the cluster. For instance, the physical hypervisor may exhibit signs of healthiness despite underlying issues, or there could be problems with the daemon on the Linux node, particularly on GPU nodes.
We have the same feature request. Two use cases:
I opened #10027 today, but it was closed as a duplicate. The proposed solutions were slightly different from the one from the opening message.
The proposal is to make a machine deleting behavior similar to pod deleting. When the rolling update settings are set for a pod controller (e.g. deployment, daemons), and a pod is deleted manually, kube-controller-manager creates an additional pod first and then deletes the old one.
My idea is to make capi-controller act the same way.
/priority backlog
This issue is currently awaiting triage.
CAPI contributors will take a look as soon as possible, apply one of the triage/*
labels and provide further guidance.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
What would you like to be added (User Story)?
I need a feature to restart one machine without restarting all nodes. Currently, the machine-deployment controller only provides cluster rolling update operation. We can recreate one machine by removing its machine resource, but that operation temporarily reduces the total computing capacity of the entire cluster.
Sometimes, any node will become unstable, and cluster admins will restart/recreate that node to resolve that problem. We don’t want to restart/recreate all nodes at once because it takes more time to complete and makes application performance unstable.
Detailed Description
Add a way to add one machine before actually terminating the machine. We need a means to remove one machine after running a new same-size machine. Our idea is to define a new annotation like
cluster.x-k8s.io/refresh
that refreshes one machine if that annotation adds machine resources. https://cluster-api.sigs.k8s.io/reference/labels_and_annotationsAnything else you would like to add?
We can also achieve the goal by having the following logic on our side, without introducing additional logic to the Cluster API side.
cluster.x-k8s.io/paused
labels.cluster.x-k8s.io/paused
labels.It may be related to this request: https://github.com/kubernetes-sigs/cluster-api/issues/1808 I’ll write an enhancement proposal if you think that is needed.
Label(s) to be applied
/kind feature /area machine