openshift / machine-config-operator

Apache License 2.0
245 stars 408 forks source link

OCPBUGS-36289: e2e test should wait for MCD pod to be running #4444

Closed cheesesashimi closed 2 months ago

cheesesashimi commented 3 months ago

- What I did

The TestMCDGetsMachineOSConfigSecrets e2e test was failing frequently during a backport to 4.16 in this PR: https://github.com/openshift/machine-config-operator/pull/4430. Basically, helpers.ExecCmdOnNode() was unable to get the current credentials on the node because that particular function uses the MCD pod as a bridge to execute a command on a given node. The reason why is because the MCD pod restarted to get the new secret volumes required by the MachineOSConfig. However, the MCD pod containers were not in a running / ready state.

This PR introduces a check to ensure that the MCD pods are in a running / ready state before moving onto the final phase of the test. This PR also includes minor cleanups such as only fetching the expected registry hostnames from the ControllerConfig once before beginning the MCD check.

Note: This change is already included in https://github.com/openshift/machine-config-operator/pull/4430 and was cherry-picked into this PR.

- How to verify it

Run the e2e-gcp-op-techpreview job and ensure that the TestMCDGetsMachineOSConfigSecrets test passes consistently.

- Description for the changelog e2e test should ensure MCD pod is ready

openshift-ci-robot commented 3 months ago

@cheesesashimi: This pull request references Jira Issue OCPBUGS-36289, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4444): >**- What I did** > >The `TestMCDGetsMachineOSConfigSecrets` e2e test was failing frequently during a backport to 4.16 in this PR: https://github.com/openshift/machine-config-operator/pull/4430. Basically, `helpers.ExecCmdOnNode()` was unable to get the current credentials on the node because that particular function uses the MCD pod as a bridge to execute a command on a given node. The reason why is because the MCD pod restarted to get the new secret volumes required by the MachineOSConfig. However, the MCD pod containers were not in a running / ready state. > >This PR introduces a check to ensure that the MCD pods are in a running / ready state before moving onto the final phase of the test. This PR also includes minor cleanups such as only fetching the expected registry hostnames from the ControllerConfig once before beginning the MCD check. > >Note: This change is already included in https://github.com/openshift/machine-config-operator/pull/4430 and was cherry-picked into this PR. > >**- How to verify it** > >Run the `e2e-gcp-op-techpreview` job and ensure that the `TestMCDGetsMachineOSConfigSecrets` test passes consistently. > >**- Description for the changelog** >e2e test should ensure MCD pod is ready > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
cheesesashimi commented 3 months ago

/jira refresh

openshift-ci-robot commented 3 months ago

@cheesesashimi: This pull request references Jira Issue OCPBUGS-36289, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @sergiordlr

In response to [this](https://github.com/openshift/machine-config-operator/pull/4444#issuecomment-2195279529): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
cheesesashimi commented 3 months ago

@sergiordlr This PR only affects the e2e test suite, so I'm not sure that QE review is required here. I'll let you make that call though.

sergiordlr commented 3 months ago

/test e2e-gcp-op-techpreview

sergiordlr commented 3 months ago

@cheesesashimi I agree. Since it only affects the e2e test suite I think that there is no need for qe-approved label.

I add it, though. Just in case

/label qe-approved

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/machine-config-operator/blob/master/OWNERS)~~ [cheesesashimi,yuqi-zhang] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
cheesesashimi commented 2 months ago

/test e2e-gcp-op-techpreview

openshift-ci-robot commented 2 months ago

/retest-required

Remaining retests: 0 against base HEAD 5fbf0e517a5bedfffe78247ce5ec65712521a3b4 and 2 for PR HEAD f84e5b59eb5936a78e190b17ad70740075929384 in total

openshift-ci-robot commented 2 months ago

/retest-required

Remaining retests: 0 against base HEAD fe7da36728158a5176acc3b2b8f1a048435c87ed and 1 for PR HEAD f84e5b59eb5936a78e190b17ad70740075929384 in total

openshift-ci-robot commented 2 months ago

/retest-required

Remaining retests: 0 against base HEAD a2f5c748e7f27e5f498febc77088ba85b21f7af2 and 0 for PR HEAD f84e5b59eb5936a78e190b17ad70740075929384 in total

openshift-ci-robot commented 2 months ago

/hold

Revision f84e5b59eb5936a78e190b17ad70740075929384 was retested 3 times: holding

cheesesashimi commented 2 months ago

/test e2e-gcp-op-techpreview

yuqi-zhang commented 2 months ago

/hold cancel

openshift-ci-robot commented 2 months ago

/retest-required

Remaining retests: 0 against base HEAD 24a39dda2601945b76a2ba953bdca056a5467aa1 and 2 for PR HEAD f84e5b59eb5936a78e190b17ad70740075929384 in total

openshift-ci[bot] commented 2 months ago

@cheesesashimi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change f84e5b59eb5936a78e190b17ad70740075929384 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-techpreview f84e5b59eb5936a78e190b17ad70740075929384 link false /test e2e-gcp-op-techpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 2 months ago

@cheesesashimi: Jira Issue OCPBUGS-36289: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-36289 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/machine-config-operator/pull/4444): >**- What I did** > >The `TestMCDGetsMachineOSConfigSecrets` e2e test was failing frequently during a backport to 4.16 in this PR: https://github.com/openshift/machine-config-operator/pull/4430. Basically, `helpers.ExecCmdOnNode()` was unable to get the current credentials on the node because that particular function uses the MCD pod as a bridge to execute a command on a given node. The reason why is because the MCD pod restarted to get the new secret volumes required by the MachineOSConfig. However, the MCD pod containers were not in a running / ready state. > >This PR introduces a check to ensure that the MCD pods are in a running / ready state before moving onto the final phase of the test. This PR also includes minor cleanups such as only fetching the expected registry hostnames from the ControllerConfig once before beginning the MCD check. > >Note: This change is already included in https://github.com/openshift/machine-config-operator/pull/4430 and was cherry-picked into this PR. > >**- How to verify it** > >Run the `e2e-gcp-op-techpreview` job and ensure that the `TestMCDGetsMachineOSConfigSecrets` test passes consistently. > >**- Description for the changelog** >e2e test should ensure MCD pod is ready > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fmachine-config-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 2 months ago

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator This PR has been included in build ose-machine-config-operator-container-v4.18.0-202407270713.p0.g40d03dc.assembly.stream.el9. All builds following this will include this PR.