Open metal4lyf opened 3 years ago
Does 1:2.02-90.0.1.el8
work? If so, that at least will narrow our focus to the fixes in the .0.2
release.
Also, can you tell us what type of device and controller you're using to boot?
Could you also try running grub2-install <boot device>
prior to rebooting to see if that resolves the issue?
Here's the boot device info. I'll try the grub install now.
$ sudo lshw -class disk
*-disk
description: ATA Disk
product: ST2000DM001-1ER1
physical id: 0.0.0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: CC27
serial: Z4Z703D6
size: 1863GiB (2TB)
capacity: 1863GiB (2TB)
capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=6 guid=9d1775b4-835a-4c76-9e43-5d544b7ec8fc logicalsectorsize=512 sectorsize=4096
Boot params are UEFI/Legacy Boot: OFF/Secure Boot: OFF.
To be on the same page, is system in UEFI mode or legacy ? To check you can check /sys/firmware/efi presence on the booted system.
UEFI
I can't get grub2-install
working. It complains about missing modinfo.sh
. No directory under /boot
contains this file so I'm not sure what to pass it.
Trying 90.0.1 now.
Yeah, forget about grub2-install. It is for legacy. Please, just before the reboot do efibootmgr -v find /boot |grep redhat find /boot |grep centos rpm -qa |grep shim
90.0.1 doesn't boot either. Reinstalled 90.0.2. Here are the results:
efibootmgr -v
:
BootCurrent: 0011
Timeout: 1 seconds
BootOrder: 0001,000C,000D,000E,000F,0010,0006,0011,0008,0009,000A,000B
Boot0000* Windows Boot Manager HD(1,GPT,87d93515-2374-4b87-9701-5a4c527ee83b,0x800,0x145000)/File(\EFI\Microsoft\Boot\bootmgfw.efi)WINDOWS.........x...B.C.D.O.B.J.E.C.T.=.{.9.d.e.a.8.6.2.c.-.5.c.d.d.-.4.e.7.0.-.a.c.c.1.-.f.3.2.b.3.4.4.d.4.7.9.5.}...;................
Boot0001* CentOS Linux HD(1,GPT,b7460ef9-456e-4086-95f9-7dc69e80ddaa,0x800,0x12c000)/File(\EFI\centos\shimx64.efi)
Boot0006* HDD NVMe(0x1,01-00-00-00-00-00-00-00)/HD(1,GPT,54969a86-cdfd-4d17-a677-4063a30945af,0x800,0x12c000)
Boot0008* PXE IP4 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x0)/MAC(b4969130ba1c,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0009* PXE IP6 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x0)/MAC(b4969130ba1c,1)/IPv6([::]:<->[::]:,0,0)..BO
Boot000A* PXE IP4 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x1)/MAC(b4969130ba1e,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot000B* PXE IP6 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x1)/MAC(b4969130ba1e,1)/IPv6([::]:<->[::]:,0,0)..BO
Boot000C* Diskette Drive BBS(Floppy,Diskette Drive,0x0)..BO
Boot000D* Internal HDD BBS(HD,Internal HDD,0x0)..BO
Boot000E* USB Storage Device BBS(USB,SanDisk,0x0)..BO
Boot000F* P7: HL-DT-ST DVD-ROM DH50N BBS(CDROM,P7: HL-DT-ST DVD-ROM DH50N,0x0)..BO
Boot0010 Onboard NIC BBS(Network,IBA CL Slot 00FE v0110,0x0)..BO
Boot0011* UEFI: SanDisk PciRoot(0x0)/Pci(0x14,0x0)/USB(7,0)/USB(1,0)/HD(1,GPT,87182ce7-da3d-414d-9ff3-3182544d7675,0x800,0x1dcf7df)..BO
find /boot | grep redhat
:
/boot/efi/EFI/redhat
/boot/efi/EFI/redhat/fonts
/boot/efi/EFI/redhat/grubenv
/boot/efi/EFI/redhat/grubx64.efi
find /boot | grep centos
:
/boot/efi/EFI/centos
/boot/efi/EFI/centos/shimx64-centos.efi
/boot/efi/EFI/centos/BOOTX64.CSV
/boot/efi/EFI/centos/mmx64.efi
/boot/efi/EFI/centos/grubenv
/boot/efi/EFI/centos/grub.cfg
/boot/efi/EFI/centos/shimx64.efi
rpm -qa | grep shim
:
shim-x64-15-15.el8_2.x86_64
How are you running centos2ol.sh
, i.e. what parameters are you using?
With or without -k
, doesn't seem to matter. When we don't pass -k
, uek is installed but not enabled.
Shim does upgrade when we downgrade to BaseOS grub.
I've also verified with BaseOS grub that we can boot uek.
The shim-x64
package should be downgraded as part of the distro-sync
that is run by default, i.e. after the switch you should have shim-x64-15-11
installed.
And if you don't pass -k
, the UEK should be installed and enabled, again with the downgrade of shim
. Something else is happening here. Can you run the switch and pipe the output to a log file so we can see the entire process? If possible, run the script with no parameters, i.e. bash centos2ol.sh | tee -a centos2ol.log
@metal4lyf So you have centos shim and Oracle grub, that explains the problem. pretty sure if you will do 1) rpm -e shim-x64 ( remove centos shim ) 2) yum install shim-x64 ( from Oracle repos ) and do the reboot everything will automagically start working.
if NOT you will still need to replace centos shim with oracle shim and do efibootmgr -c -d /dev/sda -p 1 -L "Oracle Linux" -l "\EFI\redhat\shimx64.efi"
Where /dev/sda is the ESP disk 1 is the partition number. You can do mount |grep boot and see what disk is mounted at /boot/efi to determine that. ( please notice, i am writing about ESP disk, not boot disk ).
EDIT: you may need to do rpm -e shim-x64 --force but careful(!): 100% install a new shim after removal of old one.
EDIT2: we still need to figure out why in your case shim was not replaced.
Thanks, I'll wipe this system and stage it for another run tomorrow AM. I will update with the logs and then try your suggestions. (The reason for CentOS shim and Oracle grub is because I downgraded grub to CentOS in recovery mode after the boot failed, which switched to CentOS shim, and thereafter upgraded grub to Oracle, which did not modify shim.)
Thanks @metal4lyf -- we very much appreciate the effort here!
@metal4lyf if the system is not booting with Oracle shim + Oracle grub2, efibootmgr will save you. Pretty much we anyway should apply a fix on our side for this, so running efibootmgr should be an immediate fix for you, before it is addressed by migration script.
Here's the state after a fresh migration with no flags to the script. I may have lost the log but I'll find it or run again and add here.
#!/bin/bash -xv
grubby --info=ALL | grep ^kernel
+ grubby --info=ALL
+ grep '^kernel'
kernel="/boot/vmlinuz-5.4.17-2036.104.4.el8uek.x86_64"
kernel="/boot/vmlinuz-4.18.0-240.15.1.el8_3.x86_64"
kernel="/boot/vmlinuz-4.18.0-147.el8.x86_64"
kernel="/boot/vmlinuz-0-rescue-1e1b6984890346aab6d2b455f4f5af16"
grubby --default-kernel
+ grubby --default-kernel
/boot/vmlinuz-5.4.17-2036.104.4.el8uek.x86_64
efibootmgr -v
+ efibootmgr -v
BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0001,000C,000D,000E,000F,0010,0006,0011,0008,0009,000A,000B
Boot0000* Windows Boot Manager HD(1,GPT,87d93515-2374-4b87-9701-5a4c527ee83b,0x800,0x145000)/File(\EFI\Microsoft\Boot\bootmgfw.efi)WINDOWS.........x...B.C.D.O.B.J.E.C.T.=.{.9.d.e.a.8.6.2.c.-.5.c.d.d.-.4.e.7.0.-.a.c.c.1.-.f.3.2.b.3.4.4.d.4.7.9.5.}...;................
Boot0001* CentOS Linux HD(1,GPT,1e67f230-95c6-44d2-a9be-0f5cccc00561,0x800,0x12c000)/File(\EFI\centos\shimx64.efi)
Boot0006* HDD NVMe(0x1,01-00-00-00-00-00-00-00)/HD(1,GPT,54969a86-cdfd-4d17-a677-4063a30945af,0x800,0x12c000)
Boot0008* PXE IP4 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x0)/MAC(b4969130ba1c,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0009* PXE IP6 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x0)/MAC(b4969130ba1c,1)/IPv6([::]:<->[::]:,0,0)..BO
Boot000A* PXE IP4 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x1)/MAC(b4969130ba1e,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot000B* PXE IP6 Intel(R) Ethernet 10G 2P X550-t Adapter PciRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x1)/MAC(b4969130ba1e,1)/IPv6([::]:<->[::]:,0,0)..BO
Boot000C* Diskette Drive BBS(Floppy,Diskette Drive,0x0)..BO
Boot000D* Internal HDD BBS(HD,Internal HDD,0x0)..BO
Boot000E* USB Storage Device BBS(USB,SanDisk,0x0)..BO
Boot000F* P7: HL-DT-ST DVD-ROM DH50N BBS(CDROM,P7: HL-DT-ST DVD-ROM DH50N,0x0)..BO
Boot0010 Onboard NIC BBS(Network,IBA CL Slot 00FE v0110,0x0)..BO
Boot0011* UEFI: SanDisk PciRoot(0x0)/Pci(0x14,0x0)/USB(7,0)/USB(1,0)/HD(1,GPT,87182ce7-da3d-414d-9ff3-3182544d7675,0x800,0x1dcf7df)..BO
find /boot | grep redhat
+ find /boot
+ grep redhat
/boot/efi/EFI/redhat
/boot/efi/EFI/redhat/fonts
/boot/efi/EFI/redhat/grubenv
/boot/efi/EFI/redhat/grubx64.efi
/boot/efi/EFI/redhat/BOOTX64.CSV
/boot/efi/EFI/redhat/mmx64.efi
/boot/efi/EFI/redhat/shimx64.efi
/boot/efi/EFI/redhat/grub.cfg
find /boot | grep centos
+ find /boot
+ grep centos
/boot/efi/EFI/centos
/boot/efi/EFI/centos/grubenv
/boot/efi/EFI/centos/grub.cfg
rpm -qa | grep shim
+ rpm -qa
+ grep shim
shim-x64-15-11.0.5.x86_64
OK, so what is actually happening in your case: since you have migrated from Centos to Oracle, centos EFI binaries are wiped, and Centos UEFI boot entry will be wiped on next reboot. In that case, normally ( on most systems ) /boot/efi/EFI/BOOT/BOOTX64.EFI binary is being executed ( that is the "default" boot path ) and it executes fallback, which creates UEFI boot entries for Oracle Linux. Looks like in your case that is not happening.
As an immediate measure you can run
efibootmgr -c -d /dev/sda -p 1 -L "Oracle Linux" -l "\EFI\redhat\shimx64.efi"
Where /dev/sda is the ESP disk 1 is the partition number. You can do mount |grep boot and see what disk is mounted at /boot/efi to determine that.
To create boot entry for Oracle Linux before the reboot. Ping if you are unsure what to do with efibootmgr, and i will provide a more detailed instruction.
Anyway, this case ( fallback not happening ) should be covered by our migration scripts, and that efibootmgr call should happen automatically.
Ran the migration again. Log here: ol8.log
Before reboot I ran efibootmgr as follows:
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.9G 0 disk
└─sda1 8:1 1 14.9G 0 part /mnt/sd
sr0 11:0 1 1024M 0 rom
nvme0n1 259:0 0 1.9T 0 disk
├─nvme0n1p1 259:1 0 600M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1G 0 part /boot
└─nvme0n1p3 259:3 0 1.9T 0 part
├─VolGroup00-root 253:0 0 50G 0 lvm /
├─VolGroup00-swap 253:1 0 128G 0 lvm [SWAP]
└─VolGroup00-home 253:2 0 1.7T 0 lvm /home
efibootmgr -c -d /dev/nvme0n1 -p 1 -L "Oracle Linux" -l "\EFI\redhat\shimx64.efi"
:
BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0003,0001,0002,000C,000D,000E,000F,0010,0006,0011,0008,0009,000A,000B
Boot0000* Windows Boot Manager
Boot0001* CentOS Linux
Boot0002* Oracle Linux
Boot0006* HDD
Boot0008* PXE IP4 Intel(R) Ethernet 10G 2P X550-t Adapter
Boot0009* PXE IP6 Intel(R) Ethernet 10G 2P X550-t Adapter
Boot000A* PXE IP4 Intel(R) Ethernet 10G 2P X550-t Adapter
Boot000B* PXE IP6 Intel(R) Ethernet 10G 2P X550-t Adapter
Boot000C* Diskette Drive
Boot000D* Internal HDD
Boot000E* USB Storage Device
Boot000F* P7: HL-DT-ST DVD-ROM DH50N
Boot0010 Onboard NIC
Boot0011* UEFI: SanDisk
Boot0003* Oracle Linux
I got a warning about Oracle Linux already being present as Boot0002, but this does appear to have fixed the boot!
Oracle Linux does now show up twice in our UEFI boot menu. Is there a variant of the efibootmgr
command that would consolidate/overwrite instead?
EDIT: I may have clobbered the boot menu on the USB drive I've been using to reinstall this system. I wonder if the presence of this disk is related to the boot manager issues too?
Well, if entry was already present you do not need to recreate it. Do both entries persist after reboot ? if yes, pretty much we ( and you ) will need a simple check to only execute efibootmgr in case Oracle Linux entry is NOT present, something like if ! efibootmgr -v |grep -q "Oracle Linux"; then //execute efibootmgr -c -d blahblah fi
USB disk can't affect number of entries since they are stored in NVRAM, not on any plugged in media. However (1): for the same reason ( NVRAM storage ), UEFI boot entries will not be wipted, if you reinstall the system, and binaries that are in those boot entries are actually present.
Ran the migration again. Log here: ol8.log
According to this log, the switch installed our shim-x64
package as an upgrade. I also noticed that the script had to upgrade a bunch of packages to get yum-utils
to install. Did you perhaps do a dnf update
on the CentOS instance before switching last time? Because this run looks pretty flawless from a log perspective (and would explain the duplicate UEFI boot entries).
I did not run dnf update
last time. If the server has network access, our installer adds an internal application package post-install and performs a distro sync, so perhaps that explains the difference? Sometimes I unplug network prior to save time. This all happens before running centos2ol.sh
(with network).
I've run this many times now, with and without network on the initial install, and the result has always been the same. The logs from centos2ol.sh
always look clean despite leaving the system unbootable.
@aburmash knows way more about UEFI than I do, so I'm hoping to see a pull request soon that adds a bit of efibootmgr
magic to centos2ol.sh
to mitigate this issue.
Well, if entry was already present you do not need to recreate it. Do both entries persist after reboot ?
Looks like that was a fluke, or at any rate there is only one entry after reboot, so we're good there.
Thanks for all the help!
Just to be clear: are you now able to switch your Dell boxes to OL8 and still boot? I'm not sure if there's still an outstanding issue or not, and I wanted to check before I close this.
Yes, it's working now with the efibootmgr fix.
Here's what ultimately works after running centos2ol.sh
:
# remove CentOS Linux (it is now unbootable)
efibootmgr -b $(efibootmgr | grep 'CentOS Linux' | sed -r 's/Boot([0-9A-F]+).*/\1/') -B
# remove any Oracle Linux (if it was necessary to convert more than once, existing entries will be unbootable)
efibootmgr -b $(efibootmgr | grep 'Oracle Linux' | sed -r 's/Boot([0-9A-F]+).*/\1/') -B
# add new entry for Oracle Linux
disk=/dev/$(lsblk -o MOUNTPOINT,PKNAME,KNAME | grep /boot/efi | awk '{print $2}')
part=$(lsblk -o MOUNTPOINT,PKNAME,KNAME | grep /boot/efi | awk '{print $3}' | grep -o '[0-9]*$')
efibootmgr -c -d $disk -p $part -L "Oracle Linux" -l "\EFI\redhat\shimx64.efi"
Thanks. I've updated the issue title so that we can use it as a reference for any submitted pull requests.
Referemcing another issue with similar reason https://github.com/oracle/centos2ol/issues/73
Thank you team & @aburmash let's go!
Switching default boot kernel to the UEK. Removing yum cache Switch complete. Oracle recommends rebooting this system.
Reboot.
Dead machine (>.<)
Cannot find OS.
Bruh.
Done on Centos 7 - got to attach some kind of boot device to access the machine before I can investigate.
Here is what i ended up doing,
First, found another boot device to get into terminal, then:
fdisk -l mount /dev/sda1 /mnt/
cd /mnt/EFI (ls shows BOOT centos Dell redhat)
efibootmgr -c -d /dev/sda -p 1 -L "Oracle Linux" -l "\EFI\redhat\shimx64.efi"
BootCurrent: 0001 Boot0001 Oracle Linux (also still have, among others) Boot0000 CentOS
System now boots into Oracle Linux Server release 7.9 🙌
What should I be doing next? Thank you again 💛
We are trying to migrate CentOS 8 systems to OL8.
The conversion script reports success, but it renders our systems unbootable: After the BIOS splash, we get several
>> Checking media presence .....
messages on the terminal and then the system enters Dell BIOS recovery mode, which performs a memory test and then reports "No bootable devices found! ..."Boot params are UEFI/Legacy Boot: OFF/Secure Boot: OFF.
I've isolated this issue to OL8 grub. Using a recovery stick, if I re-enable the CentOS BaseOS repo and install the latest version of grub2*, the system will boot to login with expected entries ("Oracle Linux" etc.) in the grub menu.
We're hesitant to proceed with migrations using this workaround because it requires us to continue using a potentially unsupported version of a fundamental component, not to mention we'll have to exclude grub in our dnf config to avoid bricking on dnf upgrades.
We use a stock grub configuration as far as I know.
/etc/default/grub:
/boot/grub2/grubenv:
Kernel:
4.18.0-240.15.1.el8_3.x86_64
Bricking grub2:1:2.02-90.0.2.el8_3.1
from ol8_baseos_latest Working grub2:<= 1:2.02-90.el8_3.1
from BaseOSI've yet to see a useful message from grub despite removing
rhgb quiet
. Please let me know what other info would help here.