openshift / cluster-etcd-operator

Operator to manage the lifecycle of the etcd members of an OpenShift cluster
Apache License 2.0
95 stars 127 forks source link

Revert #1309 "NO-JIRA: degrade targetconfigcontroller on quorum loss" #1312

Closed neisw closed 1 month ago

neisw commented 1 month ago

Reverts #1309 ; tracked by https://issues.redhat.com/browse/OCPBUGS-37964

Per OpenShift policy, we are reverting this breaking change to get CI and/or nightly payloads flowing again.

Azure etcd should not log excessive took too long failures

To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem:

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 10

CC: @tjungblu

PR created by Revertomatic:tm:
openshift-ci-robot commented 1 month ago

@neisw: This pull request explicitly references no jira issue.

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1312): > >Reverts #1309 ; tracked by https://issues.redhat.com/browse/OCPBUGS-37964 > >Per [OpenShift policy](https://github.com/openshift/enhancements/blob/master/enhancements/release/improving-ci-signal.md#quick-revert), we are reverting this breaking change to get CI and/or nightly payloads flowing again. > >Azure etcd should not log excessive took too long failures > >To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem: > >``` >/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 10 >``` > >CC: @tjungblu > >
>PR created by Revertomatic:tm: >
> Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-etcd-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: neisw Once this PR has been reviewed and has the lgtm label, please assign elbehery for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/openshift/cluster-etcd-operator/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
neisw commented 1 month ago

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 10

openshift-ci[bot] commented 1 month ago

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f13a0ab0-531b-11ef-9435-78bcc86e4ef6-0

neisw commented 1 month ago

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade 10

openshift-ci[bot] commented 1 month ago

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0e25e950-531c-11ef-84af-e9145f2753ce-0

tjungblu commented 1 month ago

I have my doubts that this causes disks (only) on Azure to go slow, but thanks for the revert @neisw :) Let's see the payload results.

hasbro17 commented 1 month ago

Doesn't seem like this revert affects the etcd should not log excessive took too long messages test failure on the payload runs. Still seeing that fail in most of the runs that made it past the install. https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1312-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade/1820418418164109312 https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1312-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade/1820418415244873728 https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1312-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade/1820418424887578624 https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1312-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade/1820418417304276992

tjungblu commented 1 month ago

/close

openshift-ci[bot] commented 1 month ago

@tjungblu: Closed this PR.

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1312#issuecomment-2270894306): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.