openshift / cluster-kube-apiserver-operator

The kube-apiserver operator installs and maintains the kube-apiserver on a cluster
Apache License 2.0
74 stars 159 forks source link

OCPBUGS-31384: use RotatedSigningCASecret controller in update only mode #1659

Closed tkashem closed 7 months ago

tkashem commented 8 months ago

UseSecretUpdateOnly is intended as a short term hack for a very specific use case, and it works in tandem with a particular carry patch applied to the openshift kube-apiserver. (https://github.com/openshift/kubernetes/pull/1924)

we will remove this when we migrate all of the affected secret objects to their intended type: https://issues.redhat.com/browse/API-1800

in short tls secrets used by this operator are reconciled by multiple controllers at the same time without any coordination. the issue is that the secret's crypto material can be regenerated, which has serious consequences for the platform as it can break external clients and the cluster itself.

xref: https://github.com/openshift/library-go/pull/1705 xref: https://github.com/openshift/kubernetes/pull/1924

tkashem commented 8 months ago

/cc @p0lyn0mial

tkashem commented 8 months ago

/hold

(until https://github.com/openshift/kubernetes/pull/1924 merges)

openshift-ci-robot commented 8 months ago

@tkashem: This pull request references Jira Issue OCPBUGS-31384, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.16.0) matches configured target version for branch (4.16.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @wangke19

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/cluster-kube-apiserver-operator/pull/1659): >UseSecretUpdateOnly is intended as a short term hack for a very specific use case, >and it works in tandem with a particular carry patch applied to the openshift kube-apiserver. >(https://github.com/openshift/kubernetes/pull/1924) > >we will remove this when we migrate all of the affected secret >objects to their intended type: https://issues.redhat.com/browse/API-1800 > >in short tls secrets used by this operator are reconciled by multiple controllers at the same time without any coordination. the issue is that the secret's crypto material can be regenerated, which has serious consequences for the platform as it can break external clients and the cluster itself. > >xref: https://github.com/openshift/library-go/pull/1705 >xref: https://github.com/openshift/kubernetes/pull/1924 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-kube-apiserver-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
tkashem commented 8 months ago

/retest-required

p0lyn0mial commented 8 months ago

it also requires https://github.com/openshift/kubernetes/pull/1929

p0lyn0mial commented 8 months ago

/retest-required

p0lyn0mial commented 8 months ago

/lgtm

p0lyn0mial commented 8 months ago

/cc @vrutkovs

p0lyn0mial commented 7 months ago

/retest-required

tkashem commented 7 months ago

/retest-required

tkashem commented 7 months ago

/retest

tkashem commented 7 months ago

/retest-required

tkashem commented 7 months ago

/retest-required

tkashem commented 7 months ago

/retest-required

p0lyn0mial commented 7 months ago

e2e-gcp-operator fails on TestCertRotationStompOnBadType for which i've opened https://github.com/openshift/kubernetes/pull/1932

in general due to the platform's long history (spanning several years) and the complexity of ensuring that resources were consistently created with only one type I think that we should relax the restrictions on allowed type mutation transitions.

p0lyn0mial commented 7 months ago

issues like event happened 4009 times, something is wrong: node/ip-10-0-102-140.us-east-2.compute.internal hmsg/e82226461c - reason/ConfigMapUpdated Updated ConfigMap/kube-control-plane-signer-ca -n openshift-kube-apiserver-operator: result=reject will be fixed by https://github.com/openshift/cluster-kube-apiserver-operator/pull/1661

p0lyn0mial commented 7 months ago

/retest-required

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1661 merged which should unblock all jobs except e2e-gcp-operator which will be unblocked by https://github.com/openshift/kubernetes/pull/1932

p0lyn0mial commented 7 months ago

/lgtm

openshift-ci[bot] commented 7 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/OWNERS)~~ [p0lyn0mial,tkashem] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
tkashem commented 7 months ago

/hold cancel

openshift-ci[bot] commented 7 months ago

@tkashem: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator-disruptive-single-node 6c448ef5f10f33a987d2460432d47c78926fe69b link false /test e2e-aws-operator-disruptive-single-node
ci/prow/k8s-e2e-gcp-serial 6c448ef5f10f33a987d2460432d47c78926fe69b link false /test k8s-e2e-gcp-serial
ci/prow/e2e-gcp-operator-single-node 6c448ef5f10f33a987d2460432d47c78926fe69b link false /test e2e-gcp-operator-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 7 months ago

@tkashem: Jira Issue OCPBUGS-31384: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-31384 has not been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/cluster-kube-apiserver-operator/pull/1659): >UseSecretUpdateOnly is intended as a short term hack for a very specific use case, >and it works in tandem with a particular carry patch applied to the openshift kube-apiserver. >(https://github.com/openshift/kubernetes/pull/1924) > >we will remove this when we migrate all of the affected secret >objects to their intended type: https://issues.redhat.com/browse/API-1800 > >in short tls secrets used by this operator are reconciled by multiple controllers at the same time without any coordination. the issue is that the secret's crypto material can be regenerated, which has serious consequences for the platform as it can break external clients and the cluster itself. > >xref: https://github.com/openshift/library-go/pull/1705 >xref: https://github.com/openshift/kubernetes/pull/1924 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-kube-apiserver-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 7 months ago

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-kube-apiserver-operator-container-v4.16.0-202404041616.p0.g7599746.assembly.stream.el9 for distgit ose-cluster-kube-apiserver-operator. All builds following this will include this PR.