openshift / cluster-monitoring-operator

Manage the OpenShift monitoring stack
Apache License 2.0
247 stars 363 forks source link

MON-3621: Enable `extra-scrape-metrics` feature in PrometheusUWM #2302

Closed slashpai closed 6 months ago

slashpai commented 6 months ago

Update Prometheus user-workload to enable additional scrape metrics As part of epic MON-3256

openshift-ci-robot commented 6 months ago

@slashpai: This pull request references MON-3621 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2302): >Update Prometheus user-workload to enable additional scrape metrics >As part of epic [MON-3256](https://issues.redhat.com/browse/MON-3256) > > > >* [x] I added CHANGELOG entry for this change. >* [ ] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
slashpai commented 6 months ago

/skip

slashpai commented 6 months ago

/retest-required

slashpai commented 6 months ago

@simonpasquier addressed comment

slashpai commented 6 months ago

/retest-required

slashpai commented 6 months ago

/retest-required

slashpai commented 6 months ago

@simonpasquier I tested the change in cluster-bot. Added the findings in comments in https://issues.redhat.com/browse/MON-3256

juzhao commented 6 months ago

/cc @Tai-RedHat

slashpai commented 6 months ago

/test e2e-agnostic-operator

slashpai commented 6 months ago

@simonpasquier can you review again

Tai-RedHat commented 6 months ago

@slashpai Hi, when I test this PR with cluster-bot, I can see

% oc -n openshift-user-workload-monitoring get prometheus user-workload -ojsonpath='{.spec.enableFeatures}' |jq
[
  "extra-scrape-metrics"
]

but when I follow your steps at here, when I apply PrometheusRule it shows :

The  "prometheusrules" is invalid: : group "general.rules", rule 2, "ApproachingEnforcedSamplesLimit": annotation "message": template: __alert_ApproachingEnforcedSamplesLimit:1: unexpected "|" in command

did I use the correct config?

% oc -n ns1 apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: monitoring-stack-alerts
  namespace: ns1
spec:
  groups:
  - name: general.rules
    rules:
    - alert: TargetDown
      annotations:
        message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service
          }} targets in {{ $labels.namespace }} namespace are down.'
      expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job,
        namespace, service)) > 10
      for: 10m
      labels:
        severity: warning
    - alert: ApproachingEnforcedSamplesLimit
      annotations:
        message: '{{ $labels.container }} container of the {{ $labels.pod }} pod in the {{ $labels.namespace }} namespace consumes {{ $value | humanizePercentage }} of the samples limit budget.'
      expr: (scrape_samples_post_metric_relabeling/(scrape_sample_limit > 0)) > 0.9
      for: 10m
      labels:
        severity: warning
EOF
slashpai commented 6 months ago

@slashpai Hi, when I test this PR with cluster-bot, I can see


% oc -n openshift-user-workload-monitoring get prometheus user-workload -ojsonpath='{.spec.enableFeatures}' |jq

[

  "extra-scrape-metrics"

]

but when I follow your steps at here, when I apply PrometheusRule it shows :


The  "prometheusrules" is invalid: : group "general.rules", rule 2, "ApproachingEnforcedSamplesLimit": annotation "message": template: __alert_ApproachingEnforcedSamplesLimit:1: unexpected "|" in command

did I use the correct config?


% oc -n ns1 apply -f -<<EOF

apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

  labels:

    prometheus: k8s

    role: alert-rules

  name: monitoring-stack-alerts

  namespace: ns1

spec:

  groups:

  - name: general.rules

    rules:

    - alert: TargetDown

      annotations:

        message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service

          }} targets in {{ $labels.namespace }} namespace are down.'

      expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job,

        namespace, service)) > 10

      for: 10m

      labels:

        severity: warning

    - alert: ApproachingEnforcedSamplesLimit

      annotations:

        message: '{{ $labels.container }} container of the {{ $labels.pod }} pod in the {{ $labels.namespace }} namespace consumes {{ $value | humanizePercentage }} of the samples limit budget.'

      expr: (scrape_samples_post_metric_relabeling/(scrape_sample_limit > 0)) > 0.9

      for: 10m

      labels:

        severity: warning

EOF

Can you add the contents in file and try. I think when there is $ in manifest, shell may not be parsing correctly.

Tai-RedHat commented 6 months ago

@slashpai it works now, I will add the QE label after @simonpasquier review again.

openshift-ci[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: simonpasquier, slashpai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-monitoring-operator/blob/master/OWNERS)~~ [simonpasquier,slashpai] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Tai-RedHat commented 6 months ago

/label qe-approved

openshift-bot commented 6 months ago

@slashpai: This pull request references MON-3621 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2302): >Update Prometheus user-workload to enable additional scrape metrics >As part of epic [MON-3256](https://issues.redhat.com/browse/MON-3256) > > > >* [x] I added CHANGELOG entry for this change. >* [ ] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
slashpai commented 6 months ago

/jira refresh

openshift-bot commented 6 months ago

@slashpai: This pull request references MON-3621 which is a valid jira issue.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2302#issuecomment-2049386768): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 6 months ago

@slashpai: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 6 months ago

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-monitoring-operator-container-v4.16.0-202404120544.p0.g7f498b4.assembly.stream.el9 for distgit cluster-monitoring-operator. All builds following this will include this PR.