openshift / cluster-kube-apiserver-operator

The kube-apiserver operator installs and maintains the kube-apiserver on a cluster
Apache License 2.0
74 stars 159 forks source link

OCPEDGE-902: add SNO control plane high cpu usage alert #1676

Closed qJkee closed 5 months ago

qJkee commented 6 months ago

Create separate SNO alert for control plane high cpu usage. This alert is also aware of workload partitioning and adjusts the threshold if workload partitioning is enabled.

This PR contains improvements suggested in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1673

openshift-ci-robot commented 6 months ago

@qJkee: This pull request references OCPEDGE-902 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/cluster-kube-apiserver-operator/pull/1676): >Create separate SNO alert for control plane high cpu usage. This alert is also aware of workload partitioning and adjusts the threshold if workload partitioning is enabled. > >This PR contains improvements suggested in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1673 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-kube-apiserver-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
qJkee commented 6 months ago

/cc @p0lyn0mial @vrutkovs

qJkee commented 6 months ago

/retest

qJkee commented 6 months ago

/retest jobs are failing due ci issues

qJkee commented 6 months ago

/retest

p0lyn0mial commented 5 months ago

/lgtm /hold

@qJkee wants to run an e2e test. Feel free to cancel hold once you are ready. @qJkee please don't forget to merge https://github.com/openshift/cluster-kube-apiserver-operator/pull/1680

wangke19 commented 5 months ago

/lgtm

qJkee commented 5 months ago

/hold

qJkee commented 5 months ago

/retest

wangke19 commented 5 months ago

/label qe-approved

p0lyn0mial commented 5 months ago

/lgtm

openshift-ci[bot] commented 5 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, qJkee, wangke19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/OWNERS)~~ [p0lyn0mial] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 5 months ago

@qJkee: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator-disruptive-single-node 5b7bf4cbc4d3c69e94e0fe21cc768ccd283f1fe8 link false /test e2e-aws-operator-disruptive-single-node
ci/prow/e2e-gcp-operator-single-node 5b7bf4cbc4d3c69e94e0fe21cc768ccd283f1fe8 link false /test e2e-gcp-operator-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
qJkee commented 5 months ago

/unhold Verified that everything works as expected

qJkee commented 5 months ago

/cherry-pick release-4.14,release-4.15,release-4.16

openshift-cherrypick-robot commented 5 months ago

@qJkee: cannot checkout release-4.14,release-4.15,release-4.16: error checking out "release-4.14,release-4.15,release-4.16": exit status 1 error: pathspec 'release-4.14,release-4.15,release-4.16' did not match any file(s) known to git

In response to [this](https://github.com/openshift/cluster-kube-apiserver-operator/pull/1676#issuecomment-2168443396): >/cherry-pick release-4.14,release-4.15,release-4.16 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
openshift-bot commented 5 months ago

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-kube-apiserver-operator-container-v4.17.0-202406141811.p0.gc3adc9e.assembly.stream.el9 for distgit ose-cluster-kube-apiserver-operator. All builds following this will include this PR.

qJkee commented 5 months ago

/cherry-pick release-4.14 release-4.15 release-4.16

openshift-cherrypick-robot commented 5 months ago

@qJkee: #1676 failed to apply on top of branch "release-4.14":

Applying: add SNO control plane high cpu usage alert
.git/rebase-apply/patch:59: trailing whitespace.
              To manage this alert or modify threshold it in case of false positives see the following link: 
warning: 1 line adds whitespace errors.
Using index info to reconstruct a base tree...
M   pkg/operator/starter.go
M   vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
CONFLICT (content): Merge conflict in vendor/modules.txt
Auto-merging pkg/operator/starter.go
CONFLICT (content): Merge conflict in pkg/operator/starter.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 add SNO control plane high cpu usage alert
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
In response to [this](https://github.com/openshift/cluster-kube-apiserver-operator/pull/1676#issuecomment-2178021690): >/cherry-pick release-4.14 release-4.15 release-4.16 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.