oracle / oracle-linux

Scripts, examples, and tutorials to get started with Oracle Linux
Universal Permissive License v1.0
125 stars 47 forks source link

leapp upgrade 8.10 -> 9.4, Grub boot failure #147

Open hanschou opened 2 weeks ago

hanschou commented 2 weeks ago

Hi

I tried to upgrade Oracle Linux 8.10 to 9.4 with leapp upgrade --oraclelinux but it ended up not booting with vmlinuz-upgrade and only the grub prompt was shown.

I loaded vmlinuz-upgrade and initrd-upgrade at the grub prompt manually and got the upgrade and OS running.

It seems like the grub menu.cfg was missing and installing them sorted it out: grub2-mkconfig -o /boot/grub2/grub.cfg

grubby —info ALL now show the menu entries.

I think this is a problem within leapp.

aburmash commented 2 weeks ago

Hello! Just so that i can reproduce the issue correctly. 1) this is a legacy ( not UEFI, but BIOS system ) ? 2) upgrade was done NOT in OCI ?

I in fact discovered an issue with latest grub2 update that could have caused /boot/grub2/grub.cfg missing, but it should have been seen only after first stage of upgrade is done. Did you see on a serial console if before you have been dropped to grub prompt some upgrade actions were performed or not ?

hanschou commented 2 weeks ago

Hi

[hasch@fftest ~]$ [[ -d /sys/firmware/efi ]] && echo UEFI || echo BIOS
UEFI

I'm not sure what OCI is. Please explain.

In the VMware serial console i did see the upgrade was performed. I tried several things so I could remember wrong, but first boot after leapp upgrade gave the grub prompt, and then I loaded the vmlinuz-upgrade kernel and packages was installed.

aburmash commented 2 weeks ago

OK, i have a very distinct feeling that in fact you have hit exactly that bug that we have just fixed ;) I'll update this issue, when fix is live, and in the meanwhile will also try to see if your report is in fact for a different issue.

hanschou commented 2 weeks ago

Well, as this is a test server I could ask the VMware-admin to restore the server back to 8.10, and run the upgrade again. Is there anything special I should take a note on during the process before/after? Otherwise I will just run the usual:

  1. yum upgrade
  2. reboot
  3. leapp preupgrade --oraclelinux
  4. leapp upgrade --oraclelinux
  5. reboot

BTW, the only third party package I install was Postgresql but I removed that before leapp upgrade.

aburmash commented 2 weeks ago

Very much appreciate your help! The issue is most likely caused by grub2 update that changed the order of config file processing and updated system ends up with no config in /boot/grub2.

If this is a test system you could run:

yum upgrade
reboot
leapp preupgrade --oraclelinux
leapp upgrade --oraclelinux

After that verify:

find /boot |grep grub.cfg
cat /boot/efi/EFI/redhat/grub.cfg |grep blscfg
ls /boot/loader/entries

i need output of last 3 commands.

Notice: i did NOT ask to actually peform the upgrade ( no reboot after leapp upgrade ), diag info above should in fact be enough to help me understand what issue this is.

hanschou commented 1 week ago
find /boot |grep grub.cfg
/boot/efi/EFI/redhat/grub.cfg
cat /boot/efi/EFI/redhat/grub.cfg |grep blscfg
# The blscfg command parses the BootLoaderSpec files stored in /boot/loader/entries and
insmod blscfg
blscfg
ls /boot/loader/entries
706645d081b64f3c8c1bd3ed80b205e6-0-rescue.conf
706645d081b64f3c8c1bd3ed80b205e6-4.18.0-513.18.1.el8_9.x86_64.conf
706645d081b64f3c8c1bd3ed80b205e6-4.18.0-553.5.1.el8_10.x86_64.conf
706645d081b64f3c8c1bd3ed80b205e6-4.18.0-553.el8_10.x86_64.conf
706645d081b64f3c8c1bd3ed80b205e6-5.4.17-2136.329.3.1.el8uek.x86_64.conf
706645d081b64f3c8c1bd3ed80b205e6-5.4.17-2136.331.7.el8uek.x86_64.conf
706645d081b64f3c8c1bd3ed80b205e6-5.4.17-2136.332.5.2.el8uek.x86_64.conf
706645d081b64f3c8c1bd3ed80b205e6-upgrade.x86_64.conf

The server has NOT been rebooted after leapp upgrade.

aburmash commented 1 week ago

Yeah, you were hitting grub2 config manipulation issue that led to system being unbootable post-upgrade. Note: we have pulled initial fix out of the repos, since it introduced a regression, updated version is passing testing right now. I'll update this thread, when update is live.

Thanks a lot!

hanschou commented 1 week ago

updated version is passing testing right now

Sounds good.

But then I will just roll back my test server to last snapshot and wait for next try.