openshift / cluster-monitoring-operator

Manage the OpenShift monitoring stack
Apache License 2.0
247 stars 363 forks source link

OCPBUGS-35480: Add deprecation for prometheus adapter #2381

Closed slashpai closed 3 months ago

slashpai commented 4 months ago
openshift-ci-robot commented 4 months ago

@slashpai: This pull request references Jira Issue OCPBUGS-35480, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2381): > > >* [ ] I added CHANGELOG entry for this change. >* [x] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
slashpai commented 4 months ago

/retest-required

slashpai commented 4 months ago

/retest-required

slashpai commented 4 months ago

/test e2e-aws-ovn-techpreview

slashpai commented 4 months ago

@simonpasquier can you review again?

slashpai commented 4 months ago

/retest

slashpai commented 4 months ago

The failure in tech preview job looks to be unrelated to this change

juzhao commented 4 months ago

tested with PR, still can see UpdatingPrometheusAdapter in the CMO logs(fix for this is not included in this PR), there is not warning message for the deprecated PrometheusAdapter setting in the CMO logs either(do we need to log the warning message?).

$ oc -n openshift-monitoring logs deploy/cluster-monitoring-operator | grep "UpdatingPrometheusAdapter" | head -n6
I0619 04:05:53.203865       1 tasks.go:70] running task 7 of 16: UpdatingPrometheusAdapter
I0619 04:05:53.203870       1 tasks.go:76] ran task 7 of 16: UpdatingPrometheusAdapter
I0619 04:10:59.294312       1 tasks.go:70] running task 7 of 16: UpdatingPrometheusAdapter
I0619 04:10:59.294366       1 tasks.go:76] ran task 7 of 16: UpdatingPrometheusAdapter
I0619 04:12:28.365875       1 tasks.go:70] running task 7 of 16: UpdatingPrometheusAdapter
I0619 04:12:28.365926       1 tasks.go:76] ran task 7 of 16: UpdatingPrometheusAdapter

alertClusterMonitoringOperatorDeprecatedConfig could be fired

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=cluster_monitoring_operator_deprecated_config_in_use' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cluster_monitoring_operator_deprecated_config_in_use",
          "configmap": "openshift-monitoring/cluster-monitoring-config",
          "container": "cluster-monitoring-operator",
          "deprecation_version": "4.16",
          "endpoint": "https",
          "field": "k8sPrometheusAdapter",
          "instance": "10.130.0.15:8443",
          "job": "cluster-monitoring-operator",
          "namespace": "openshift-monitoring",
          "pod": "cluster-monitoring-operator-59c7b4845d-q2mrz",
          "prometheus": "openshift-monitoring/k8s",
          "service": "cluster-monitoring-operator"
        },
        "value": [
          1718774171.547,
          "1"
        ]
      }
    ],
    "analysis": {}
  }
}

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="ClusterMonitoringOperatorDeprecatedConfig"}' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "ALERTS",
          "alertname": "ClusterMonitoringOperatorDeprecatedConfig",
          "alertstate": "pending",
          "configmap": "openshift-monitoring/cluster-monitoring-config",
          "deprecation_version": "4.16",
          "field": "k8sPrometheusAdapter",
          "prometheus": "openshift-monitoring/k8s",
          "severity": "info"
        },
        "value": [
          1718774477.234,
          "1"
        ]
      }
    ],
    "analysis": {}
  }
}
slashpai commented 4 months ago

tested with PR, still can see UpdatingPrometheusAdapter in the CMO logs(fix for this is not included in this PR), there is not warning message for the deprecated PrometheusAdapter setting in the CMO logs either(do we need to log the warning message?).

$ oc -n openshift-monitoring logs deploy/cluster-monitoring-operator | grep "UpdatingPrometheusAdapter" | head -n6
I0619 04:05:53.203865       1 tasks.go:70] running task 7 of 16: UpdatingPrometheusAdapter
I0619 04:05:53.203870       1 tasks.go:76] ran task 7 of 16: UpdatingPrometheusAdapter
I0619 04:10:59.294312       1 tasks.go:70] running task 7 of 16: UpdatingPrometheusAdapter
I0619 04:10:59.294366       1 tasks.go:76] ran task 7 of 16: UpdatingPrometheusAdapter
I0619 04:12:28.365875       1 tasks.go:70] running task 7 of 16: UpdatingPrometheusAdapter
I0619 04:12:28.365926       1 tasks.go:76] ran task 7 of 16: UpdatingPrometheusAdapter

alertClusterMonitoringOperatorDeprecatedConfig could be fired

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=cluster_monitoring_operator_deprecated_config_in_use' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cluster_monitoring_operator_deprecated_config_in_use",
          "configmap": "openshift-monitoring/cluster-monitoring-config",
          "container": "cluster-monitoring-operator",
          "deprecation_version": "4.16",
          "endpoint": "https",
          "field": "k8sPrometheusAdapter",
          "instance": "10.130.0.15:8443",
          "job": "cluster-monitoring-operator",
          "namespace": "openshift-monitoring",
          "pod": "cluster-monitoring-operator-59c7b4845d-q2mrz",
          "prometheus": "openshift-monitoring/k8s",
          "service": "cluster-monitoring-operator"
        },
        "value": [
          1718774171.547,
          "1"
        ]
      }
    ],
    "analysis": {}
  }
}

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="ClusterMonitoringOperatorDeprecatedConfig"}' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "ALERTS",
          "alertname": "ClusterMonitoringOperatorDeprecatedConfig",
          "alertstate": "pending",
          "configmap": "openshift-monitoring/cluster-monitoring-config",
          "deprecation_version": "4.16",
          "field": "k8sPrometheusAdapter",
          "prometheus": "openshift-monitoring/k8s",
          "severity": "info"
        },
        "value": [
          1718774477.234,
          "1"
        ]
      }
    ],
    "analysis": {}
  }
}

we haven't removed PrometheusAdapter task in this PR. We have added only deprecation metric which will set to 1 if k8sPrometheusAdapter fields are defined in cluster-monitoring-config map

juzhao commented 4 months ago

/label qe-approved

openshift-ci-robot commented 4 months ago

@slashpai: This pull request references Jira Issue OCPBUGS-35480, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2381): > > >* [ ] I added CHANGELOG entry for this change. >* [x] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
slashpai commented 4 months ago

@juzhao We can add in cmo logs if any k8sPrometheusAdapter fields defined as well. I will update the PR

simonpasquier commented 4 months ago

/hold

maybe combine with #2386?

openshift-ci-robot commented 4 months ago

@slashpai: This pull request references Jira Issue OCPBUGS-35480, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2381): >combined logic from https://github.com/openshift/cluster-monitoring-operator/pull/2386 also here as suggested by Simon > > >* Add deprecation for prometheus-adapter >* Per https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/replace-prometheus-adapter-with-metrics-server.md#design-details we wanted to use prometheus adapter config for metrics server. This change is to handle that. We would want to backport this to 4.16 and remove it in 4.17 > > > >* [ ] I added CHANGELOG entry for this change. >* [x] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
slashpai commented 4 months ago

/hold need to test the configs on a cluster where metrics-server already running

slashpai commented 4 months ago

@simonpasquier I think we need to test the config setting from prometheus-adpater to metrics-server if config exists a bit more as it can be error prone (from a quick test in cluster bot). Shall we have it in separate PR and have only the deprecation part in this PR?

Edit: Discussed offline and decided to keep only deprecation code in this PR

slashpai commented 4 months ago

/cherry-pick release-4.16

openshift-cherrypick-robot commented 4 months ago

@slashpai: once the present PR merges, I will cherry-pick it on top of release-4.16 in a new PR and assign it to you.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2381#issuecomment-2182456031): >/cherry-pick release-4.16 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
slashpai commented 4 months ago

@simonpasquier can you review again?

slashpai commented 4 months ago

/retest

slashpai commented 3 months ago

@simonpasquier PTAL when you get a chance :)

machine424 commented 3 months ago

/lgtm Thanks!

openshift-ci[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: machine424, simonpasquier, slashpai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-monitoring-operator/blob/master/OWNERS)~~ [machine424,simonpasquier,slashpai] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci-robot commented 3 months ago

/retest-required

Remaining retests: 0 against base HEAD 7f52b1a0f0de84925ab93412a4e5127c000f774c and 2 for PR HEAD 51b540b4c56a037037604ddc7a0675dd741bb02e in total

openshift-ci-robot commented 3 months ago

/retest-required

Remaining retests: 0 against base HEAD 5d1fd1bb52eeb9b2f877c45de0cf93e2f9fffb95 and 1 for PR HEAD 51b540b4c56a037037604ddc7a0675dd741bb02e in total

slashpai commented 3 months ago

/retest-required

slashpai commented 3 months ago

/retest-required

openshift-ci[bot] commented 3 months ago

@slashpai: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 3 months ago

@slashpai: Jira Issue OCPBUGS-35480: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-35480 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2381): >* Add deprecation for prometheus-adapter > > > >* [ ] I added CHANGELOG entry for this change. >* [ ] No user facing changes, so no entry in CHANGELOG was needed. > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-monitoring-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-cherrypick-robot commented 3 months ago

@slashpai: new pull request created: #2396

In response to [this](https://github.com/openshift/cluster-monitoring-operator/pull/2381#issuecomment-2182456031): >/cherry-pick release-4.16 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.