Closed sradco closed 2 months ago
@sradco is there any card tracking this change?
/assign ingvagabund Hi Jan, I will appreciate your review. This is a high priority issue for OpenShift Virtualization that is causing a lot of noise for our users.
Is there a customer case? A bug report?
Is there a customer case? A bug report?
Yes. updated also in the description now. https://issues.redhat.com/browse/CNV-33834.
This has been also suggested in https://github.com/openshift/cluster-kube-controller-manager-operator/pull/583# , but was ultimately discarded in favor of silencing.
There is a need for more generic and complete approach. Alternative solution to eviction is being discussed in the Evacuation API KEP upstream.
/hold
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: sradco Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
@atiratree Hi, This doesn't align with the effort to reduce alerts noise. We have an alert to address this issue with a dedicated runbook that explain how this should be addressed in the VMs use case, https://github.com/kubevirt/monitoring/blob/main/docs/runbooks/VMCannotBeEvicted.md.
An alert requires an action by the user, and the PodDisruptionBudgetAtLimit
alert only adds confusion in this case, since it will not recommend the correct action, https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetAtLimit.md#mitigation.
In our case the admin needs to set the evictionStrategy
of the VMI to shutdown
.
The user will be confused and bothered by this alert until he either opens a support case or finds the kbase article, which is not a reasonable alternative.
@sradco: all tests passed!
Full PR test history. Your PR dashboard.
Closing this for now since we decided to try to silence the alert, using the silences API. It is the the only way other that updating the expression of the alert to actually impact the UI and not only the Alertmanager forwarder.
Update the PodDisruptionBudgetAtLimit alert to not fire for kubevirt-disruption-budge, since they are not needed in this use case.
This PR is very important since the alerts fire for every namespace that is running VM, even tough it is not needed and is causing an alert fatigue.
Jira-url: https://issues.redhat.com/browse/CNV-33834
Signed-off-by: Shirly Radco sradco@redhat.com