openshift / machine-config-operator

Apache License 2.0
244 stars 396 forks source link

OCPBUGS-35300: MCD-pull: run after network-online.target again #4422

Closed yuqi-zhang closed 1 week ago

yuqi-zhang commented 1 month ago

This was originally reordered such that ovs-configuration was able to run when the reboot was skipped. See: https://github.com/openshift/machine-config-operator/pull/3858

This broke ARO since they require the network to be ready for the pull to happen (and generally, probably best for the network to be ready before attempting to pull the new OS image).

Since the services have changed since then, ovs-configuration no longer depends on the existence of the firstboot file, so we should be able to untangle this dependency.

- What I did

- How to verify it

- Description for the changelog

openshift-ci[bot] commented 1 month ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci-robot commented 1 month ago

@yuqi-zhang: This pull request references Jira Issue OCPBUGS-35300, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (hlipsig+1@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4422): >This was originally reordered such that ovs-configuration was able to run when the reboot was skipped. See: https://github.com/openshift/machine-config-operator/pull/3858 > >This broke ARO since they require the network to be ready for the pull to happen (and generally, probably best for the network to be ready before attempting to pull the new OS image). > >Since the services have changed since then, ovs-configuration no longer depends on the existence of the firstboot file, so we should be able to untangle this dependency. > > > >**- What I did** > >**- How to verify it** > >**- Description for the changelog** > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/machine-config-operator/blob/master/OWNERS)~~ [yuqi-zhang] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
yuqi-zhang commented 1 month ago

/test e2e-gcp-op /test e2e-aws-ovn

yuqi-zhang commented 3 weeks ago

Since this might affect the functionality introduced in https://github.com/openshift/machine-config-operator/pull/3858, @sergiordlr would you be able to test whether this breaks that or not? ARO verified that this works for them on 4.14

If not, @ori-amizur could you find someone to verify that this wouldn't break bare metal non-reboot installs? Thanks!

ori-amizur commented 2 weeks ago

If not, @ori-amizur could you find someone to verify that this wouldn't break bare metal non-reboot installs? Thanks!

We are going to test tomorrow, but I made a quick test today. It seems that one of the masters failed to complete installation - (crio service failed to start).

sdodson commented 2 weeks ago

/test ?

openshift-ci[bot] commented 2 weeks ago

@sdodson: The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/openshift/machine-config-operator/pull/4422#issuecomment-2200736105): >/test ? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
sdodson commented 2 weeks ago

/test e2e-metal-assisted

openshift-ci[bot] commented 2 weeks ago

@yuqi-zhang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-upgrade-out-of-change 632de7c8c59c7866e6936d5d65813fd82b80ce70 link false /test e2e-aws-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-techpreview 632de7c8c59c7866e6936d5d65813fd82b80ce70 link false /test e2e-gcp-op-techpreview
ci/prow/e2e-vsphere-ovn-upi 632de7c8c59c7866e6936d5d65813fd82b80ce70 link false /test e2e-vsphere-ovn-upi
ci/prow/e2e-metal-assisted 632de7c8c59c7866e6936d5d65813fd82b80ce70 link false /test e2e-metal-assisted

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
ori-amizur commented 2 weeks ago

Made 2 tests - both failed. It means this change causes assisted installer to fail installations.

yuqi-zhang commented 2 weeks ago

Thanks for testing. We can for now make it Azure only via https://github.com/openshift/machine-config-operator/pull/4423 instead, and then circle back on this I guess. I don't really see another good option.

@ori-amizur were you able to capture why it failed?

ori-amizur commented 2 weeks ago

were you able to capture why it failed?

I tried to look briefly at the cluster but I didn't find the reason it failed.

yuqi-zhang commented 1 week ago

This will break other things, so closing in favour of other solutions