openshift / machine-config-operator

Apache License 2.0
245 stars 409 forks source link

OCPBUGS-38320: Revert "templates/master/cri-o: make crun as the default container runtime" #4523

Closed neisw closed 2 months ago

neisw commented 2 months ago

Reverts openshift/machine-config-operator#4437; tracked by https://issues.redhat.com/browse/OCPBUGS-38320

Per OpenShift policy, we are reverting this breaking change to get CI and/or nightly payloads flowing again.

4.18/4.17 Disruption failures

To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem:

/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade-out-of-change
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial

CC: @sohankunkerkar

PR created by Revertomatic:tm:
openshift-ci-robot commented 2 months ago

@neisw: This pull request references OCPNODE-2357 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.18." or "openshift-4.18.", but it targets "openshift-4.17" instead.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4523): >Reverts openshift/machine-config-operator#4437 - Opening to just to run payload tests currently > >Seeing failures in 4.8 & 4.17 nightly payloads afther 4437 && https://github.com/openshift/machine-config-operator/pull/4518 landed for [e2e-aws-ovn-serial](https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial/1822233611227631616) > >``` >[sig-storage] [Serial] Volume metrics Ephemeral should create volume metrics with the correct BlockMode PVC ref [Suite:openshift/conformance/serial] [Suite:k8s] expand_less 5m0s >{ schedulerName: default-scheduler > securityContext: {} > serviceAccount: default > serviceAccountName: default > terminationGracePeriodSeconds: 30 > tolerations: > - effect: NoExecute > key: node.kubernetes.io/not-ready > operator: Exists > tolerationSeconds: 300 > - effect: NoExecute > key: node.kubernetes.io/unreachable > operator: Exists > tolerationSeconds: 300 > volumes: > - ephemeral: > volumeClaimTemplate: > metadata: > creationTimestamp: null > spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 2Gi > volumeMode: Block > name: volume1 > - name: kube-api-access-gzst8 > projected: > defaultMode: 420 > sources: > - serviceAccountToken: > expirationSeconds: 3607 > path: token > - configMap: > items: > - key: ca.crt > path: ca.crt > name: kube-root-ca.crt > - downwardAPI: > items: > - fieldRef: > apiVersion: v1 > fieldPath: metadata.namespace > path: namespace > - configMap: > items: > - key: service-ca.crt > path: service-ca.crt > name: openshift-service-ca.crt > status: > conditions: > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:06:04Z" > status: "True" > type: PodReadyToStartContainers > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > status: "True" > type: Initialized > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > message: 'containers with unready status: [write-pod]' > reason: ContainersNotReady > status: "False" > type: Ready > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > message: 'containers with unready status: [write-pod]' > reason: ContainersNotReady > status: "False" > type: ContainersReady > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > status: "True" > type: PodScheduled > containerStatuses: > - image: quay.io/openshift/community-e2e-images:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-36-1-1-n3BezCOfxp98l84K > imageID: "" > lastState: {} > name: write-pod > ready: false > restartCount: 0 > started: false > state: > waiting: > message: | > container create failed: mknod `/mnt/`: No such file or directory > reason: CreateContainerError > hostIP: 10.0.109.141 > hostIPs: > - ip: 10.0.109.141 > phase: Pending > podIP: 10.128.3.3 > podIPs: > - ip: 10.128.3.3 > qosClass: BestEffort > startTime: "2024-08-10T13:05:54Z" >Error: exit with code 1 >Ginkgo exit error 1: exit with code 1} >``` Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 2 months ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

neisw commented 2 months ago

/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial

openshift-ci[bot] commented 2 months ago

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/88d29100-5761-11ef-847b-299df58422e3-0

openshift-ci-robot commented 2 months ago

@neisw: This pull request references OCPNODE-2357 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.18." or "openshift-4.18.", but it targets "openshift-4.17" instead.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4523): >Reverts openshift/machine-config-operator#4437 - Opening to just to run payload tests currently > >Seeing failures in 4.8 & 4.17 nightly payloads after 4437 && https://github.com/openshift/machine-config-operator/pull/4518 landed for [e2e-aws-ovn-serial](https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial/1822233611227631616) > >``` >[sig-storage] [Serial] Volume metrics Ephemeral should create volume metrics with the correct BlockMode PVC ref [Suite:openshift/conformance/serial] [Suite:k8s] expand_less 5m0s >{ schedulerName: default-scheduler > securityContext: {} > serviceAccount: default > serviceAccountName: default > terminationGracePeriodSeconds: 30 > tolerations: > - effect: NoExecute > key: node.kubernetes.io/not-ready > operator: Exists > tolerationSeconds: 300 > - effect: NoExecute > key: node.kubernetes.io/unreachable > operator: Exists > tolerationSeconds: 300 > volumes: > - ephemeral: > volumeClaimTemplate: > metadata: > creationTimestamp: null > spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 2Gi > volumeMode: Block > name: volume1 > - name: kube-api-access-gzst8 > projected: > defaultMode: 420 > sources: > - serviceAccountToken: > expirationSeconds: 3607 > path: token > - configMap: > items: > - key: ca.crt > path: ca.crt > name: kube-root-ca.crt > - downwardAPI: > items: > - fieldRef: > apiVersion: v1 > fieldPath: metadata.namespace > path: namespace > - configMap: > items: > - key: service-ca.crt > path: service-ca.crt > name: openshift-service-ca.crt > status: > conditions: > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:06:04Z" > status: "True" > type: PodReadyToStartContainers > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > status: "True" > type: Initialized > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > message: 'containers with unready status: [write-pod]' > reason: ContainersNotReady > status: "False" > type: Ready > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > message: 'containers with unready status: [write-pod]' > reason: ContainersNotReady > status: "False" > type: ContainersReady > - lastProbeTime: null > lastTransitionTime: "2024-08-10T13:05:54Z" > status: "True" > type: PodScheduled > containerStatuses: > - image: quay.io/openshift/community-e2e-images:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-36-1-1-n3BezCOfxp98l84K > imageID: "" > lastState: {} > name: write-pod > ready: false > restartCount: 0 > started: false > state: > waiting: > message: | > container create failed: mknod `/mnt/`: No such file or directory > reason: CreateContainerError > hostIP: 10.0.109.141 > hostIPs: > - ip: 10.0.109.141 > phase: Pending > podIP: 10.128.3.3 > podIPs: > - ip: 10.128.3.3 > qosClass: BestEffort > startTime: "2024-08-10T13:05:54Z" >Error: exit with code 1 >Ginkgo exit error 1: exit with code 1} >``` Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
neisw commented 2 months ago

/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade-out-of-change

openshift-ci[bot] commented 2 months ago

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/640c3d00-57dc-11ef-918e-8813d265f0cc-0

openshift-ci-robot commented 2 months ago

@neisw: This pull request references OCPNODE-2357 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.18." or "openshift-4.18.", but it targets "openshift-4.17" instead.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4523): >Reverts openshift/machine-config-operator#4437; tracked by https://issues.redhat.com/browse/OCPBUGS-38320 > >Per [OpenShift policy](https://github.com/openshift/enhancements/blob/master/enhancements/release/improving-ci-signal.md#quick-revert), we are reverting this breaking change to get CI and/or nightly payloads flowing again. > >4.18/4.17 Disruption failures > >To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem: > >``` >/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade-out-of-change >``` > >CC: @sohankunkerkar > >
>PR created by Revertomatic:tm: >
Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
neisw commented 2 months ago

/retitle OCPBUGS-38320: Revert "templates/master/cri-o: make crun as the default container runtime"

openshift-ci-robot commented 2 months ago

@neisw: This pull request references Jira Issue OCPBUGS-38320, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.18.0) matches configured target version for branch (4.18.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (schoudha@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4523): >Reverts openshift/machine-config-operator#4437; tracked by https://issues.redhat.com/browse/OCPBUGS-38320 > >Per [OpenShift policy](https://github.com/openshift/enhancements/blob/master/enhancements/release/improving-ci-signal.md#quick-revert), we are reverting this breaking change to get CI and/or nightly payloads flowing again. > >4.18/4.17 Disruption failures > >To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem: > >``` >/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade-out-of-change >``` > >CC: @sohankunkerkar > >
>PR created by Revertomatic:tm: >
Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
neisw commented 2 months ago

/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial

openshift-ci[bot] commented 2 months ago

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/bf2a2e50-5808-11ef-91e5-e71150672066-0

openshift-ci-robot commented 2 months ago

@neisw: This pull request references Jira Issue OCPBUGS-38320, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.18.0) matches configured target version for branch (4.18.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (schoudha@redhat.com), skipping review request.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4523): >Reverts openshift/machine-config-operator#4437; tracked by https://issues.redhat.com/browse/OCPBUGS-38320 > >Per [OpenShift policy](https://github.com/openshift/enhancements/blob/master/enhancements/release/improving-ci-signal.md#quick-revert), we are reverting this breaking change to get CI and/or nightly payloads flowing again. > >4.18/4.17 Disruption failures > >To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem: > >``` >/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade-out-of-change >``` >``` >/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial >``` > >CC: @sohankunkerkar > >
>PR created by Revertomatic:tm: >
Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
neisw commented 2 months ago

The payload jobs confirm that the revert clears the disruption during node updates for the 'out-of-change' job as well as passes the test that is permafailing for aws-ovn-serial job in 4.18. We see other failures for aws-ovn-serial but they occur outside of this PR as well.

: [sig-storage] CSI Mock selinux on mount metrics SELinuxMount metrics [LinuxOnly] [Feature:SELinux] [Serial] error is bumped on two Pods with a different context on RWOP volume [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [Suite:openshift/conformance/serial] [Suite:k8s] expand_less    1m15s
{  fail [k8s.io/kubernetes/test/e2e/storage/csi_mock/csi_selinux_mount.go:497]: waiting for metrics map[volume_manager_selinux_volume_context_mismatch_errors_total:{}] to increase: metric volume_manager_selinux_volume_context_mismatch_errors_total{volume_plugin="kubernetes.io/csi/csi-mock-e2e-csi-mock-volumes-selinux-metrics-409"} unexpectedly increased to 2
Error: exit with code 1
Ginkgo exit error 1: exit with code 1}

: [sig-storage] CSI Mock selinux on mount metrics SELinuxMount metrics [LinuxOnly] [Feature:SELinux] [Serial] warning is bumped on two Pods with a different context on RWO volume [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [Feature:SELinuxMountReadWriteOncePodOnly] [Suite:openshift/conformance/serial] [Suite:k8s] expand_less    1m24s
{  fail [k8s.io/kubernetes/test/e2e/storage/csi_mock/csi_selinux_mount.go:497]: waiting for metrics map[volume_manager_selinux_volume_context_mismatch_warnings_total:{}] to increase: metric volume_manager_selinux_volume_context_mismatch_warnings_total{volume_plugin="kubernetes.io/csi/csi-mock-e2e-csi-mock-volumes-selinux-metrics-5001"} unexpectedly increased to 1
Error: exit with code 1
Ginkgo exit error 1: exit with code 1}
sohankunkerkar commented 2 months ago

@giuseppe Do you see any regression here from the crun's perspective?

neisw commented 2 months ago

/test unit

openshift-ci[bot] commented 2 months ago

@neisw: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op 19389d0731f4203ec47252018a4d7245031cb269 link true /test e2e-gcp-op
ci/prow/e2e-hypershift 19389d0731f4203ec47252018a4d7245031cb269 link true /test e2e-hypershift
ci/prow/e2e-aws-ovn-upgrade 19389d0731f4203ec47252018a4d7245031cb269 link true /test e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
xueqzhan commented 2 months ago

/lgtm

xueqzhan commented 2 months ago

/label approved

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: neisw, xueqzhan Once this PR has been reviewed and has the lgtm label, please assign giuseppe for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[templates/master/01-master-container-runtime/OWNERS](https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-container-runtime/OWNERS)** - **[templates/worker/01-worker-container-runtime/OWNERS](https://github.com/openshift/machine-config-operator/blob/master/templates/worker/01-worker-container-runtime/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
deads2k commented 2 months ago

this is reverting the most recent commit of master, merging to unblock the org.

openshift-ci-robot commented 2 months ago

@neisw: Jira Issue OCPBUGS-38320: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38320 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4523): >Reverts openshift/machine-config-operator#4437; tracked by https://issues.redhat.com/browse/OCPBUGS-38320 > >Per [OpenShift policy](https://github.com/openshift/enhancements/blob/master/enhancements/release/improving-ci-signal.md#quick-revert), we are reverting this breaking change to get CI and/or nightly payloads flowing again. > >4.18/4.17 Disruption failures > >To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem: > >``` >/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade-out-of-change >``` >``` >/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial >``` > >CC: @sohankunkerkar > >
>PR created by Revertomatic:tm: >
Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 2 months ago

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator This PR has been included in build ose-machine-config-operator-container-v4.18.0-202408121743.p0.g6caab8b.assembly.stream.el9. All builds following this will include this PR.