openshift / cluster-monitoring-operator

Manage the OpenShift monitoring stack
Apache License 2.0
247 stars 363 forks source link

OCPBUGS-23000: node-exporter: Prevent cluster-autoscaler from evicting #2346

Closed theobarberbany closed 5 months ago

theobarberbany commented 5 months ago

Adds the 'enable-ds-eviction' annotation to prevent the cluster autoscaler from removing the node-exporter daemonset during a scaling event.

The motivation for doing this to prevent blocks when node critical pods get evicted prior to workloads.

openshift-ci-robot commented 5 months ago

@theobarberbany: This pull request references Jira Issue OCPBUGS-23000, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.16.0) matches configured target version for branch (4.16.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @sunzhaohua2

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2346): >Adds the 'enable-ds-eviction' annotation to prevent the cluster autoscaler from removing the node-exporter daemonset during a scaling event. > > > >* [ ] I added CHANGELOG entry for this change. >* [ ] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
JoelSpeed commented 5 months ago

/lgtm

From autoscaler side, we are doing this to prevent blocks when node critical pods get evicted prior to workloads

openshift-ci[bot] commented 5 months ago

@theobarberbany: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
machine424 commented 5 months ago

/lgtm Thanks. I always thought that --daemonset-eviction-for-empty-nodes should default to true instead of false, and daemonset-eviction-for-occupied-nodes to false instead of true upstream. It's more practical, realistic, and aligns better with the usage that admins may have for daemonsets. Even so, I have no issue with being explicit.

I don't know if CAO allows changing those args, but it can be useful for admins that don't want/can't continually track the daemonsets to make sure the right annotation is there.

openshift-ci[bot] commented 5 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, machine424, theobarberbany

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-monitoring-operator/blob/master/OWNERS)~~ [machine424] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-bot commented 5 months ago

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.

openshift-ci-robot commented 5 months ago

@openshift-bot: This pull request references Jira Issue OCPBUGS-23000, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.16.0) matches configured target version for branch (4.16.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @sunzhaohua2

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2346#issuecomment-2117164466): >/jira refresh > >The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 5 months ago

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.

openshift-ci-robot commented 5 months ago

@openshift-bot: This pull request references Jira Issue OCPBUGS-23000, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2346#issuecomment-2117803836): >/jira refresh > >The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
simonpasquier commented 5 months ago

/jira refresh

openshift-ci-robot commented 5 months ago

@simonpasquier: This pull request references Jira Issue OCPBUGS-23000, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @sunzhaohua2

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2346#issuecomment-2124149952): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 5 months ago

@theobarberbany: Jira Issue OCPBUGS-23000: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-23000 has not been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2346): >Adds the 'enable-ds-eviction' annotation to prevent the cluster autoscaler from removing the node-exporter daemonset during a scaling event. > >The motivation for doing this to prevent blocks when node critical pods get evicted prior to workloads. > > > > >* [ ] I added CHANGELOG entry for this change. >* [ ] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.