Closed tjungblu closed 9 months ago
/hold
/retest
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9c96d870-b457-11ee-991a-701add1801b7-0
/payload 4.16 nightly blocking
/retest
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/349a8620-b48b-11ee-80d4-c5cc632357a6-0
test cluster seemingly went down, trying again
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a0bdbe60-b51e-11ee-8ff5-e60f551db126-0
/retest
seems under some condition the events for missing resources get emitted more frequently than before. I'm checking those out in more detail.
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/47a0e700-b620-11ee-8e3b-cb0f9d6219f6-0
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cf0d57f0-b904-11ee-9d81-b3ffd2920b91-0
@tjungblu: This pull request references ETCD-512 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.16.0" version, but no target version was set.
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/123f2240-bb96-11ee-8540-db573853c215-0
I think we also need to increase the flake threshold for the time being: https://github.com/openshift/origin/blob/ec6f7585f45704ccafaaed76772a87d8f96cbcab/pkg/monitortests/etcd/legacyetcdmonitortests/pathological_events.go#L9-L19
that static pod additional rollout (I believe) causes the installer to choke a little more often than before
@tjungblu: This pull request references ETCD-512 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.16.0" version, but no target version was set.
/test ?
@tjungblu: The following commands are available to trigger required jobs:
/test e2e-agnostic-ovn
/test e2e-agnostic-ovn-upgrade
/test e2e-aws-ovn-etcd-scaling
/test e2e-aws-ovn-serial
/test e2e-aws-ovn-single-node
/test e2e-gcp-qe-no-capabilities
/test e2e-metal-assisted
/test e2e-metal-ipi-ovn-ipv6
/test e2e-operator
/test e2e-operator-fips
/test images
/test unit
/test verify
/test verify-deps
The following commands are available to trigger optional jobs:
/test configmap-scale
/test e2e-aws
/test e2e-aws-disruptive
/test e2e-aws-disruptive-ovn
/test e2e-aws-etcd-recovery
/test e2e-azure
/test e2e-azure-ovn-etcd-scaling
/test e2e-gcp
/test e2e-gcp-disruptive
/test e2e-gcp-disruptive-ovn
/test e2e-gcp-ovn-etcd-scaling
/test e2e-metal-ipi
/test e2e-metal-ipi-serial-ipv4
/test e2e-metal-single-node-live-iso
/test e2e-vsphere-ovn-etcd-scaling
Use /test all
to run the following jobs that were automatically triggered:
pull-ci-openshift-cluster-etcd-operator-master-e2e-agnostic-ovn
pull-ci-openshift-cluster-etcd-operator-master-e2e-agnostic-ovn-upgrade
pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-etcd-recovery
pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling
pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-serial
pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-single-node
pull-ci-openshift-cluster-etcd-operator-master-e2e-gcp-qe-no-capabilities
pull-ci-openshift-cluster-etcd-operator-master-e2e-operator
pull-ci-openshift-cluster-etcd-operator-master-e2e-operator-fips
pull-ci-openshift-cluster-etcd-operator-master-images
pull-ci-openshift-cluster-etcd-operator-master-unit
pull-ci-openshift-cluster-etcd-operator-master-verify
pull-ci-openshift-cluster-etcd-operator-master-verify-deps
/test configmap-scale /test e2e-aws /test e2e-azure /test e2e-azure-ovn-etcd-scaling /test e2e-gcp /test e2e-gcp-ovn-etcd-scaling /test e2e-metal-ipi /test e2e-metal-ipi-serial-ipv4 /test e2e-metal-single-node-live-iso /test e2e-vsphere-ovn-etcd-scaling
so this seems to be also a cache sync issue on the apiserver-operator:
2024-01-25T17:36:05.791965265Z I0125 17:36:05.791900 1 event.go:364] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"c8ca232e-a8f7-4567-919b-66c229939652", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RequiredInstallerResourcesMissing' secrets: aggregator-client,bound-service-account-signing-key,check-endpoints-client-cert-key,control-plane-node-admin-client-cert-key,external-loadbalancer-serving-certkey,internal-loadbalancer-serving-certkey,kubelet-client,localhost-serving-cert-certkey,node-kubeconfigs,service-network-serving-certkey, secrets: etcd-client-10,localhost-recovery-client-token-10,localhost-recovery-serving-certkey-10
2024-01-25T17:36:05.792400417Z E0125 17:36:05.792325 1 base_controller.go:268] InstallerController reconciliation failed: missing required resources: [secrets: aggregator-client,bound-service-account-signing-key,check-endpoints-client-cert-key,control-plane-node-admin-client-cert-key,external-loadbalancer-serving-certkey,internal-loadbalancer-serving-certkey,kubelet-client,localhost-serving-cert-certkey,node-kubeconfigs,service-network-serving-certkey, secrets: etcd-client-10,localhost-recovery-client-token-10,localhost-recovery-serving-certkey-10]
2024-01-25T17:36:06.005475338Z I0125 17:36:06.005418 1 reflector.go:351] Caches populated for *v1.Secret from k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229
2024-01-25T17:36:06.015869416Z I0125 17:36:06.015809 1 base_controller.go:73] Caches are synced for CertRotationController
2024-01-25T17:36:06.015869416Z I0125 17:36:06.015848 1 base_controller.go:110] Starting #1 worker of CertRotationController controller ...
2024-01-25T17:36:06.015921058Z I0125 17:36:06.015876 1 base_controller.go:73] Caches are synced for CertRotationController
...
that shuts the event up entirely. Seems a race between the installer controller from library-go and some of the secret informers?
Created https://issues.redhat.com/browse/OCPBUGS-28243 for the apiserver operator, I'll see that I can fix this race in CEO - otherwise we can simply increase the thresholds with the origin PR.
/retest
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/08916f80-bc3d-11ee-888a-6fe982462ef8-0
/retest
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/85e912e0-bc64-11ee-84c6-0bcedc6654ce-0
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/dbb632b0-be86-11ee-85c8-372b57d8451a-0
/payload 4.16 nightly blocking
@tjungblu: An error was encountered. No known errors were detected, please see the full error message for details.
could not create PullRequestPayloadQualificationRun: client rate limiter Wait returned an error: context canceled
Please contact an administrator to resolve this issue.
/payload 4.16 nightly blocking
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/411b8590-bf47-11ee-9186-572c7d58a7db-0
/retest
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/49e0f280-c011-11ee-8794-fac7d04776b2-0
/retest-required
/payload 4.16 nightly blocking
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/85968190-c03f-11ee-8d0c-083ebd0bf674-0
/hold cancel
/payload 4.16 nightly blocking
triggering another run, last one seems botched again by the app cluster issues
@tjungblu: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/80eb2980-c061-11ee-855a-9115017afbc4-0
Running another for good measure
/payload 4.16 nightly blocking
@hasbro17: trigger 8 job(s) of type blocking for the nightly release of OCP 4.16
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/28310470-c0c7-11ee-9ac3-113064ab5974-0
one last run for the squash:
/payload 4.16 nightly blocking
This PR will
The consequence of merging this PR is: