openshift / cluster-monitoring-operator

Manage the OpenShift monitoring stack
Apache License 2.0
247 stars 363 forks source link

OCPBUGS-39126: disable user-defined monitoring per object #2452

Closed simonpasquier closed 1 month ago

simonpasquier commented 1 month ago

Before this change, a project owner could only disable user-defined monitoring per namespace/project (typically to prevent the PrometheusOperatorRejectedResources alert from firing).

To provide greater flexibility, it is now possible to exclude individual objects (e.g. ServiceMonitor, PodMonitor and PrometheusRule) by adding the openshift.io/user-monitoring="false" label to them.

openshift-ci-robot commented 1 month ago

@simonpasquier: This pull request references Jira Issue OCPBUGS-39126, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2452): >Before this change, a project owner could only disable user-defined monitoring per namespace/project (typically to prevent the `PrometheusOperatorRejectedResources` alert from firing). > >To provide greater flexibility, it is now possible to exclude individual objects (e.g. `ServiceMonitor`, `PodMonitor` and `PrometheusRule`) by adding the `openshift.io/user-monitoring="true"` label to them. > > > >* [ ] I added CHANGELOG entry for this change. >* [ ] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
simonpasquier commented 1 month ago

/hold need to update the e2e tests

simonpasquier commented 1 month ago

/jira refresh

openshift-ci-robot commented 1 month ago

@simonpasquier: This pull request references Jira Issue OCPBUGS-39126, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.18.0) matches configured target version for branch (4.18.0) * bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @juzhao

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2452#issuecomment-2324521995): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
simonpasquier commented 1 month ago

/skip

simonpasquier commented 1 month ago

/retest-required

simonpasquier commented 1 month ago

/hold cancel

simonpasquier commented 1 month ago

/cc @machine424

simonpasquier commented 1 month ago

I've tested the PR to see if it could help with issues like

After deploying the compliance operator and otel operator + enabling user-defined monitoring, I see the PrometheusOperatorRejectedResources active (as expected):

image

image

After labeling the 2 service monitor objects with openshift.io/user-monitoring="false", the alert went away:

image

machine424 commented 1 month ago

/lgtm But I’m concerned this might discourage users from addressing the root cause of the issue: deploying operators in the wrong ns. Also, this could create confusion (I don't know if we want to document it or not) as now looking at the ns labels isn't enough to be sure a resource would be considered.

nit: I think you meant openshift.io/user-monitoring="false" in the PR desc.

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: machine424, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-monitoring-operator/blob/master/OWNERS)~~ [machine424,simonpasquier] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 1 month ago

@simonpasquier: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/versions dd9708a6b749bd67540b6198064278cbf70830bd link false /test versions

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 1 month ago

/retest-required

Remaining retests: 0 against base HEAD 886a4be40db97afb3638fab99246e0e2ab8111d7 and 2 for PR HEAD dd9708a6b749bd67540b6198064278cbf70830bd in total

openshift-ci-robot commented 1 month ago

@simonpasquier: Jira Issue OCPBUGS-39126: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-39126 has not been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2452): >Before this change, a project owner could only disable user-defined monitoring per namespace/project (typically to prevent the `PrometheusOperatorRejectedResources` alert from firing). > >To provide greater flexibility, it is now possible to exclude individual objects (e.g. `ServiceMonitor`, `PodMonitor` and `PrometheusRule`) by adding the `openshift.io/user-monitoring="true"` label to them. > > > >* [ ] I added CHANGELOG entry for this change. >* [x] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 1 month ago

[ART PR BUILD NOTIFIER]

Distgit: cluster-monitoring-operator This PR has been included in build cluster-monitoring-operator-container-v4.18.0-202409060111.p0.gc5668e3.assembly.stream.el9. All builds following this will include this PR.

openshift-ci-robot commented 1 month ago

@simonpasquier: Jira Issue OCPBUGS-39126: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-39126 has not been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2452): >Before this change, a project owner could only disable user-defined monitoring per namespace/project (typically to prevent the `PrometheusOperatorRejectedResources` alert from firing). > >To provide greater flexibility, it is now possible to exclude individual objects (e.g. `ServiceMonitor`, `PodMonitor` and `PrometheusRule`) by adding the `openshift.io/user-monitoring="false"` label to them. > > > >* [ ] I added CHANGELOG entry for this change. >* [x] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
simonpasquier commented 1 month ago

/cherrypick release-4.17 release-4.16

openshift-cherrypick-robot commented 1 month ago

@simonpasquier: new pull request created: #2458

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2452#issuecomment-2333554646): >/cherrypick release-4.17 release-4.16 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.