oamg / leapp-repository

Leapp repositories containing actors for the Leapp framework (https://github.com/oamg/leapp). Currently provides leapp repositories for in-place upgrades of RHEL systems.
Apache License 2.0
48 stars 144 forks source link

LEAPPing machines that needs 3rd-party kernel modules (eg, disk drivers) to boot might fail #705

Open krono opened 3 years ago

krono commented 3 years ago

Actual behavior Some systems are set up with 3rd-party kernel modules for the booting hard disk. (Most typically, these come as DUPs at install time)

When leapping, it seems that these modules are not picked up for the upgrade-initrd.

To Reproduce Steps to reproduce the behavior

  1. find a machine with a FakeRAID
  2. Install RHEL7
  3. Try to leapp
  4. After Reboot and boot into the upgrade-initramfs, dracut aborts and complains about missing disks.

Expected behavior LEAPP may pick up kmods for the initramfs when they appear in the upgrade.

Maybe a switch, simliar to enablerepo could be used to force the "installation" of certain packages prior to creation of the initiramfs?

System information

[System is dead now]

Situation that lead to this idea **** **Context and Things tried** Our machine contains an infamous Intel C620/LSI MegaSR2-RAID chip. Vendors provide drivers for these as DUPs, eg Dell, HPE, or Fujitsu. This means, installing with that chip using DUPs is fine. Our preupgrade went fine, and in anticipation we even included the online-version of the DUPs in the `--enablerepo` step. This resulted in no error and the _new_ version of the kmod was listed among the packages to be installe (NOTE: it was marked as a _downgrade_, as the RHEL7.9 version of the driver has a _higher_ version number than the RHEL8.2) version. We also had to make use of a targeted LEAPP (to 8.2), as the driver is not yet available for 8.4, only up to 8.3. After reviewing the report, we proceeded with `upgrade` and stopped just before `reboot`. At that time, we grabbed the log files (see [leap-log.zip](https://github.com/oamg/leapp/files/6957880/leap-log.zip) ) and inspected the initramfs and compared it to the initramfs of the still running RHEL7.9. (you will find a few mentiones of megasr2 in the logs). We found the kmod missing. As a workaround, we tried manually including the kmod from the already downloaded rpm into the initramfs:
Initramfs patching steps These are specific to RHEL8.2, and the Fujitsu variant of the MegaSR2-Driver ([which can be found here](http://patches.ts.fujitsu.com/linux/pldp/RHEL8/rhel8-u2/x86_64/) The following steps make the intiramfs similar to the RHEL7.9 one with regards to megasr. ```bash # upgrade with enabled fujitsu repo leapp upgrade --enablerepo primergy-kmod-el8.2 # prior to reboot: # extract initramfs into temporary location mkdir ~/initramfs-upgrade cd ~/initramfs-upgrade /usr/lib/dracut/skipcpio /boot/initramfs-upgrade.x86_64.img | zcat | cpio -idv # find #/var/lib/leapp/el8userspace/var/cache/dnf/primergy-kmod-el8.2-7b6ee48acb7dd887/packages/kmod-megasr2-18.02.2020.0827.4fts-2.el8.2.x86_64.rpm # pour rpm contents into extracted initramfs rpm2cpio /var/lib/leapp/el8userspace/var/cache/dnf/primergy-kmod-el8.2-7b6ee48acb7dd887/packages/kmod-megasr2-18.02.2020.0827.4fts-2.el8.2.x86_64.rpm | cpio -idv # create "weak-update" structure normaly created when actually installing the rpm and running dracut mkdir -p usr/lib/modules/4.18.0-193.28.1.el8_2.x86_64/weak-updates/primergy-megasr2 ln -s ../../../4.18.0-193.el8.x86_64/extra/primergy-megasr2/megasr2.ko usr/lib/modules/4.18.0-193.28.1.el8_2.x86_64/weak-updates/primergy-megasr2/megasr2.ko # update various modules.* files but only in the initramfs-directory depmod -b $PWD 4.18.0-193.28.1.el8_2.x86_64 # backup actual initramfs mv /boot/initramfs-upgrade.x86_64.img /boot/initramfs-upgrade.x86_64.img.ORIGINAL # repack initramfs # NOTE: THIS DROPS AMD MICROCODE UPDATE find . | cpio -o -c -R root:root | gzip -9 > /boot/initramfs-upgrade.x86_64.img cd - reboot ```
After reboot, the system _actually sees_ the disk which indicates that the driver was found. However during the brief period we could watch the system, - The installation of the `primergy-megasr2` package in the new userland seemed to fail - The kernel did not seem to be properly installed and dracut failes - Eventually, leapp exited and tried to write a log file "outside" of the container, which failed due to "read-only filesystem" The system then rebooted, but at grub, only the RHEL7 variants/kernels were available. Trying to boot these hangs the system. **** \* *EDIT* \*: It turns out the kernel and initramfs _were_ correctly build with the 3rd-party module, however, the respective entries were not written to the grub config. In fact, the old, RHEL7.9 `grub.cfg` is in place, and the config to be, `grub.cfg.new` is cut off right after `### begin /etc/grub.d/10_linux`. manually editing the grub cmdline boots the system, but it seems the root file system was damaged and took with it the `leap_resume.service`

Note: closing this simply because we used LEAPP_UNSUPPORTED machinery would be fair. Nonetheless, a means to include kmods for a leapped upgrade would be nice.

pirat89 commented 3 years ago

Hi @krono. Thanks for the report and for the sharing of steps you need to do to update the upgrade-initramfs. I have a busy time in these days, so I will go more carefully through it later. So answering now just to what I read, without looking into the provided logs.

We are aware about this limitation and we want to deliver a mechanism for users to make possible to create relatively simple custom actors (to customize/extend the default IPU functionality) to take care about various kernel drivers etc. During the upgrade.

Currently we delivered in the upstream a mechanism that provides possibility to affect dracut modules used in the upgrade & target initramfs and added possibility to say what files should be aded into these initramfss. What is missing still is the mechanism for kernel drivers specifically and maybe the possibility to affect used dracut options. Unfortunately we have not documented it yet, as the testing is not finished and we would like to implement the support for drivers first. So to use the implemented mechanism could be now a little bit tricky. In short, it should be so easy as produce couple of messages in an actor. For example, look at the commonleappdracutmodules actor. It's not the best example as this is affecting just the upgrade initramfs and you can see there still a deprecated code present, but in short, something like

   api.produce(UpgradeInitramfsTasks(include_dracut_modules=[DracutModule(name=...,)])

Just in case of drivers, expect something like UpgradeInitramfsTasks(add_drivers=[...]). If you have an RPM providing the dracut modules, etc. You could even tell leapp to install it to the environment (container) we use to create the upgrade initramfs. Expect you probably will need to create a similar message for the target initramfs. More about difference between the upgrade and target initramfs is described in the models files (below). I expect we will document this much better in future when we finish the implementation.

Currenty we are discussing our priorities. Opening ticket on RH support or BZ for leapp-repository could help with prioritisation.

Is the proposed solution OK for you? Currently we do not expect adding of a CLI option for that.

Additional notes:

krono commented 3 years ago

Hi @pirat89 thanks for reading through my (admittedly uncooridnated) notes.

You could even tell leapp to install it to the environment (container) we use to create the upgrade initramfs.

Thats what I thought. Something like "early packages", which probably would also solve you mdadm-thing…

For me it was a one-time-thing. We somehow got the affected machine to work and just hope that we got far enough in the leapping that its fine now.

Feel free to close; I learned quite bit, tho, thanks!

pirat89 commented 3 years ago

Hi @krono, thank for letting us know. I will keep this one opened as we can use it for public tracking around the RAIDs question & drivers in general. Just realized that for someone else who could read this, it would be helpful to see the script that is executed to create the upgrade initramfs:

In case of mdadm, right now it seems that

could be helpful in case of mdadm. But I haven't tested it yet and I am not SME around storage. It's just our idea where we would like to start experiment with mdadm in future.

krono commented 3 years ago

Sounds like a good idea to me. Thanks.

pirat89 commented 1 year ago

It seems we will be working around that in upcoming months. pinning the issue.

bessonc commented 9 months ago

sounds fixed by https://github.com/oamg/leapp-repository/pull/1081

pirat89 commented 8 months ago

To my understanding It's fixed partially. Another work is still expected regarding mdadm. However,