openstack-k8s-operators / edpm-ansible

External Dataplane Management Ansible Playbooks
https://openstack-k8s-operators.github.io/edpm-ansible/
Apache License 2.0
9 stars 66 forks source link

Fix grub2-mkconfig kernel args for --update-bls-cmdline #650

Closed eshulman2 closed 6 months ago

eshulman2 commented 6 months ago

In order to update the kernel args properly in RHEL 9.X the --update-bls-cmdline is required.Without this flag the /boot/loader/entries/ files are not updated which causes the kernal args in the grub.cfg to be ignored. Added the flag for RHEL9 and up to allow proper cmdline update during boot.

This patch should fix this issue: EDPM fails to set kernel args in deployed compute in RHEL 9.4

openshift-ci[bot] commented 6 months ago

Hi @eshulman2. Thanks for your PR.

I'm waiting for a openstack-k8s-operators member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 6 months ago

@ekuris-redhat: changing LGTM is restricted to collaborators

In response to [this](https://github.com/openstack-k8s-operators/edpm-ansible/pull/650#pullrequestreview-2033414654): >Legit Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
SeanMooney commented 6 months ago

this looks like it was changed in rhel 9.3 https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/9.3_release_notes/new-features#new-features-boot-loader I'm not sure if this behaviour change has also been applied in centos but https://forums.centos.org/viewtopic.php?t=80571 implies it might be.

SeanMooney commented 6 months ago

/ok-to-test

SeanMooney commented 6 months ago

@SeanMooney should I change the condition to support Centos as well?

we should confirm if cifmw-crc-podified-edpm-baremetal or podified-multinode-edpm-deployment-crc exersies this code

idealy i think we would want an enhancement to https://review.rdoproject.org/zuul/build/b407811db8be42328fcac06bff2bf2d8

to assert that the bls is being applied but i think

the current check will skip it on centos so we probably should apply the same logic there also.

softwarefactory-project-zuul[bot] commented 6 months ago

Build failed (check pipeline). Post recheck (without leading slash) to rerun all jobs. Make sure the failure cause has been resolved before you rerun jobs.

https://review.rdoproject.org/zuul/buildset/6f98fb45e5d94bb59f6a67b8e0e65249

:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 2h 04m 18s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 45m 10s :x: cifmw-crc-podified-edpm-baremetal FAILURE in 1h 03m 30s :heavy_check_mark: edpm-ansible-molecule-edpm_bootstrap SUCCESS in 5m 54s :heavy_check_mark: edpm-ansible-molecule-edpm_podman SUCCESS in 5m 02s :heavy_check_mark: edpm-ansible-molecule-edpm_module_load SUCCESS in 4m 29s :heavy_check_mark: edpm-ansible-molecule-edpm_kernel SUCCESS in 8m 54s :heavy_check_mark: edpm-ansible-molecule-edpm_libvirt SUCCESS in 10m 23s :heavy_check_mark: edpm-ansible-molecule-edpm_nova SUCCESS in 10m 36s :heavy_check_mark: edpm-ansible-molecule-edpm_frr SUCCESS in 7m 18s :heavy_check_mark: edpm-ansible-molecule-edpm_iscsid SUCCESS in 5m 07s :heavy_check_mark: edpm-ansible-molecule-edpm_ovn_bgp_agent SUCCESS in 8m 16s :heavy_check_mark: edpm-ansible-molecule-edpm_ovs SUCCESS in 5m 23s

slagle commented 6 months ago

this looks like it was changed in rhel 9.3 https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/9.3_release_notes/new-features#new-features-boot-loader I'm not sure if this behaviour change has also been applied in centos but https://forums.centos.org/viewtopic.php?t=80571 implies it might be.

grun2-mkconfig doesn't have the option in the latest CentOS-Stream-GenericCloud-x86_64-9-latest.x86_64.qcow2

SeanMooney commented 6 months ago

this looks like it was changed in rhel 9.3 https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/9.3_release_notes/new-features#new-features-boot-loader I'm not sure if this behaviour change has also been applied in centos but https://forums.centos.org/viewtopic.php?t=80571 implies it might be.

grun2-mkconfig doesn't have the option in the latest CentOS-Stream-GenericCloud-x86_64-9-latest.x86_64.qcow2

based on that do we want to proceed with this without testing for now or hold till we can configure a rhel 9.3+ job? or test downstream only?

in theory, it should never be the case that Rhel has a feature like this without it first being released in Centos so if it is not in the latest c9s repos then someone didn't follow the correct process here.

slagle commented 6 months ago

this looks like it was changed in rhel 9.3 https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/9.3_release_notes/new-features#new-features-boot-loader I'm not sure if this behaviour change has also been applied in centos but https://forums.centos.org/viewtopic.php?t=80571 implies it might be.

grun2-mkconfig doesn't have the option in the latest CentOS-Stream-GenericCloud-x86_64-9-latest.x86_64.qcow2

based on that do we want to proceed with this without testing for now or hold till we can configure a rhel 9.3+ job? or test downstream only?

in theory, it should never be the case that Rhel has a feature like this without it first being released in Centos so if it is not in the latest c9s repos then someone didn't follow the correct process here.

that's what I was wondering myself. CentOS Stream isn't ahead of RHEL in this case.

eshulman2 commented 6 months ago

recheck

eshulman2 commented 6 months ago

Tested patch in downstream environment using RHEL9.4.

eshulman2 commented 6 months ago

@rabi tested on downstream deployment using RHEL 9.4

SeanMooney commented 6 months ago

i have approved the github workflow to allow the remaining test to run.

the current version should work on any OS where its supported. we are still missing test coverage however. so I'm not going to approve the pr as is until a @slagle or others agree to address this testing gap as a follow up.

i would suggest reaching out on slack or attending the edpm call on Monday and adding this as a topic.

eshulman2 commented 6 months ago

i have approved the github workflow to allow the remaining test to run.

the current version should work on any OS where its supported. we are still missing test coverage however. so I'm not going to approve the pr as is until a @slagle or others agree to address this testing gap as a follow up.

i would suggest reaching out on slack or attending the edpm call on Monday and adding this as a topic.

I will be able to work on tests on Sunday but I have a few questions:

  1. should I test it on localhost as it is now and on another host from the pool you mentioned?
  2. should we reboot to verify or should be using grubby good enough?
  3. are there any additional requirements to test it or should it be sufficient?
Jaganathancse commented 6 months ago

i have approved the github workflow to allow the remaining test to run.

the current version should work on any OS where its supported. we are still missing test coverage however. so I'm not going to approve the pr as is until a @slagle or others agree to address this testing gap as a follow up.

i would suggest reaching out on slack or attending the edpm call on Monday and adding this as a topic.

@SeanMooney LGTM, @eshulman2 or i will work on test coverage part related to this as new PR. Can we merge this PR and we can work top of this PR since blocking RHEL9.4 testing?

openshift-ci[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ekuris-redhat, eshulman2, Jaganathancse, slagle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/OWNERS)~~ [slagle] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment