openshift / cluster-kube-apiserver-operator

The kube-apiserver operator installs and maintains the kube-apiserver on a cluster
Apache License 2.0
74 stars 159 forks source link

NO-ISSUE: Revert "certrotationcontroller: set AutoRegenerateAfterOfflineExpiry for generated certificates #1661

Closed p0lyn0mial closed 7 months ago

p0lyn0mial commented 7 months ago

This reverts commit 6f3faa4beae08550e95f0fc124f49b9e6baca52c introduced in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1652

We have multiple controllers reconciling the same resource at the same time without any coordination. For example the kube-control-plane-signer-ca configmap is synced by 4 controllers (actually it is 8 because we have 2 processes).

Since the mentioned PR added a distinct annotation (AutoRegenerateAfterOfflineExpiry) for the same configmap it caused a hot update loop since the configmap was different.

The conflict error can be seen even on a successful run but usually it causes the CI jobs to fail..

Since the issue is common I decided to revert as it might even block CI payloads from merging.

It should also unblock https://github.com/openshift/cluster-kube-apiserver-operator/pull/1659

We should stop adding more changes to the cert rotation controllers until we resolve the race.

p0lyn0mial commented 7 months ago

/assign @vrutkovs

openshift-ci-robot commented 7 months ago

@p0lyn0mial: This pull request explicitly references no jira issue.

In response to [this](https://github.com/openshift/cluster-kube-apiserver-operator/pull/1661): >This reverts commit 6f3faa4beae08550e95f0fc124f49b9e6baca52c introduced in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1652 > >We have multiple controllers reconciling the same resource at the same time without any coordination. >For example the `kube-control-plane-signer-ca` configmap is synced by 4 controllers (actually it is 8 because we have 2 processes). > >Since the mentioned PR added a distinct annotation (`AutoRegenerateAfterOfflineExpiry`) for the same configmap it caused a hot update loop since the configmap was different. > >The conflict error can be seen [even on a successful run ](https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1577/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws-ovn/1771293936002797568) but usually it causes [the CI jobs to fail.](https://prow.ci.openshift.org/job-history/gs/test-platform-results/pr-logs/directory/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws-ovn). > >Since [the issue is common](https://search.dptools.openshift.org/?search=kube-control-plane-signer-ca&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job) I decided to revert as it might even block CI payloads from merging. > > >It should also unblock https://github.com/openshift/cluster-kube-apiserver-operator/pull/1659 > >We should stop adding more changes to the cert rotation controllers until we resolve the race. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-kube-apiserver-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
vrutkovs commented 7 months ago

/lgtm

openshift-ci[bot] commented 7 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/OWNERS)~~ [p0lyn0mial,vrutkovs] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
p0lyn0mial commented 7 months ago

ci/prow/k8s-e2e-gcp and ci/prow/e2e-gcp-operator failed on cluster installation, logs are missing and it looks like loki wasn't set up either.

p0lyn0mial commented 7 months ago

/retest-required

openshift-ci[bot] commented 7 months ago

@p0lyn0mial: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-operator-single-node 5823d3d49da742fc579aaceaaf50418a16e78df9 link false /test e2e-gcp-operator-single-node
ci/prow/e2e-aws-operator-disruptive-single-node 5823d3d49da742fc579aaceaaf50418a16e78df9 link false /test e2e-aws-operator-disruptive-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 7 months ago

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-kube-apiserver-operator-container-v4.16.0-202404040915.p0.g7ae9875.assembly.stream.el9 for distgit ose-cluster-kube-apiserver-operator. All builds following this will include this PR.