Open robertm98 opened 3 months ago
Hello! Thanks for the report, in fact last update issued for linked issue has zero code changes, though it MIGHT have regenerated a grub config for you, maybe that is triggering the issue. Are you seeing any other errors except for unknown TPM error ? Are you using BTRFS filesystem or/and BTRFS snapshots ?
Nevermind, reproduced it, we are going to pull out this update and issue a proper one shortly.
Thank you. For info the filesystem is XFS. A minor change is the name of lvm group form "ol" to "olb" so as not to clash with the volume group name of the the previous installation on the original drive when I copy files across. I wondered if this could be relevant due to the questions about the filesystem, but from your last reply probably not. The installation is on a separate SATA drive and all other drives are disconnected.
@robertm98 once again thank you very much! I see that it is not related to filesystems, just broken grub config.
same issue here, is there any way to fix broken grub / grub.cfg from within UEFI interactive shell?
The only way I think this could be repaired is to do a recovery boot from the installation media. chroot to /mnt/sysroot (I think) then possibly use dnf to do a roll back or edit the config. @aburmash Would it be possible to get the details of the errors in the config and what needs to be done to make things good, please? What needs editing and then running to apply the config changes.
@robertm98 @m45733r i will provide recovery instructions from UEFI shell shortly.
@m45733r
1) if you have already installed bad update, but did not reboot:
grub2-mkconfig > /boot/grub2/grub.cfg
OR
grub2-mkconfig > /boot/efi/EFI/redhat/grub.cfg
2) if you can only do stuff from UEFI shell.
FS0: Alias(s):HD0a1b:;BLK1:
PciRoot(0x0)/Pci(0x4,0x0)/Scsi(0x0,0x1)/HD(1,GPT,3AF7074E-C0BB-400D-8FC7-E9EC738AA53F,0x800,0x32000)
BLK0: Alias(s):
PciRoot(0x0)/Pci(0x4,0x0)/Scsi(0x0,0x1)
BLK2: Alias(s):
PciRoot(0x0)/Pci(0x4,0x0)/Scsi(0x0,0x1)/HD(2,GPT,14BE7023-6C02-4573-8891-9F639B9D936A,0x32800,0x400000)
BLK3: Alias(s):
PciRoot(0x0)/Pci(0x4,0x0)/Scsi(0x0,0x1)/HD(3,GPT,E700F071-90A5-40BB-8132-52AF688193B7,0x432800,0x5900800)****
fs0:
ls
if you see EFI dir, you are where you need to be
cd EFI/redhat
rm grub.cfg
grubx64.efi
you will be dropped to grub cmdline
ls
it will display list of disks available, there you need to find a disk that has /boot dir or identify /boot partition
run
ls <disk>/
to see which one is that
for example:
ls (hd0,gpt2)/
when you have found the /boot you will see something like
grub> ls (hd0,gpt2)/
./ ../ efi/ grub2/ loader/ vmlinuz-5.14.0-427.16.1.el9_4.x86_64
System.map-5.14.0-427.16.1.el9_4.x86_64 config-5.14.0-427.16.1.el9_4.x86_64
.vmlinuz-5.14.0-427.16.1.el9_4.x86_64.hmac
symvers-5.14.0-427.16.1.el9_4.x86_64.gz
initramfs-5.14.0-427.16.1.el9_4.x86_64.img
vmlinuz-5.15.0-206.153.7.el9uek.x86_64
System.map-5.15.0-206.153.7.el9uek.x86_64 config-5.15.0-206.153.7.el9uek.x86_64
.vmlinuz-5.15.0-206.153.7.el9uek.x86_64.hmac
symvers-5.15.0-206.153.7.el9uek.x86_64.gz
initramfs-5.15.0-206.153.7.el9uek.x86_64.img
initramfs-0-rescue-36703c3cdc50ff74e863e867384f6a8a.img
vmlinuz-0-rescue-36703c3cdc50ff74e863e867384f6a8a
initramfs-5.15.0-206.153.7.el9uek.x86_64kdump.img
Now you need to check boot info for you kernel
ls (hd0,gpt2)/loader/entries/
grub> ls (hd0,gpt2)/loader/entries/
./ ../ 8c622b7d13354f7fbe5eee50d3f340bd-5.14.0-427.16.1.el9_4.x86_64.conf
8c622b7d13354f7fbe5eee50d3f340bd-5.15.0-206.153.7.el9uek.x86_64.conf
36703c3cdc50ff74e863e867384f6a8a-0-rescue.conf
cat (hd0,gpt2)/loader/entries/8c622b7d13354f7fbe5eee50d3f340bd-5.15.0-206.153.7.el9uek.x86_64.conf
You will see something like:
title Oracle Linux Server (5.15.0-206.153.7.el9uek.x86_64 with Unbreakable Ente
rprise Kernel) 9.4
version 5.15.0-206.153.7.el9uek.x86_64
linux /vmlinuz-5.15.0-206.153.7.el9uek.x86_64
initrd /initramfs-5.15.0-206.153.7.el9uek.x86_64.img $tuned_initrd
options root=/dev/mapper/ocivolume-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M LANG=en_US.UTF-8 console=tty0 console=ttyS0,115200 rd.luks=0 rd.md=0 rd.dm=0 rd.lvm.vg=ocivolume rd.lvm.lv=ocivolume/root rd.net.timeout.dhcp=10 rd.net.timeout.carrier=5 netroot=iscsi:169.254.0.2:::1:iqn.2015-02.oracle.boot:uefi rd.iscsi.param=node.session.timeo.replacement_timeout=6000 net.ifnames=1 nvme_core.shutdown_timeout=10 ipmi_si.tryacpi=0 ipmi_si.trydmi=0 libiscsi.debug_libiscsi_eh=1 loglevel=4 crash_kexec_post_notifiers
grub_users $grub_users
grub_arg --unrestricted
grub_class ol
Now still in grub cmdline run:
linux (hd0,gpt2)/vmlinuz-5.15.0-206.153.7.el9uek.x86_64 root=/dev/mapper/ocivolume-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M LANG=en_US.UTF-8 console=tty0 console=ttyS0,115200 rd.luks=0 rd.md=0 rd.dm=0 rd.lvm.vg=ocivolume rd.lvm.lv=ocivolume/root rd.net.timeout.dhcp=10 rd.net.timeout.carrier=5 netroot=iscsi:169.254.0.2:::1:iqn.2015-02.oracle.boot:uefi rd.iscsi.param=node.session.timeo.replacement_timeout=6000 net.ifnames=1 nvme_core.shutdown_timeout=10 ipmi_si.tryacpi=0 ipmi_si.trydmi=0 libiscsi.debug_libiscsi_eh=1 loglevel=4 crash_kexec_post_notifiers
initrd (hd0,gpt2)/initramfs-5.15.0-206.153.7.el9uek.x86_64.img
boot
where kernel = kernel form config
options for kernel = options from config
initrd = initrd from config
IMPORTANT: when doing copy/pastes VERIFY that
linux string is a single string, if you have newlines or returns in the buffer - they will NOT be applied.
So when you have full linux string copied - paste it to some file to verify that it is a single string.
do not forget that path is relative to your partition with /boot or /boot partition.
If your /boot is on /root partition, you will need to find the disk with root partition and your paths will be something like
(lvm/volume-root)/boot/
When system is booted run:
grub2-mkconfig > /boot/grub2/grub.cfg
grub2-mkconfig > /boot/efi/EFI/redhat/grub.cfg
@robertm98 the problem is that on OL9, config file for grub2 was switched to parent config in /boot/efi/EFI/redhat/grub.cfg that in order loads proper /boot/grub2/grub.cfg config.
For CERTAIN /boot/efi/EFI/redhat/grub.cfg config contents fix that was applied for leapp in-place upgrade instead of correctly updating configs ( or not touching them ), writes /boot/efi/EFI/redhat/grub.cfg into /boot/grub2/grub.cfg and system chainloops.
Thanks for the instructions, some remarks from my expierence: Running grubx64.efi after grub.cfg was deleted did not automatically put me into grub cmdline but was stuck and I needed to power-cycle the machine. ls (hd0,gpt1) only shows "Filesystems is fat" or "Filesystem is xfs", not actual contents. However ls (hd0,gpt2)/loader/entries would only succeed on the right disk and list its contents, and show not found on all others.
boot was successful, but after login + grub2-mkconfig + reboot it would return to grub cmdline again :/ Reading your latest comment I tried mkconfig to /boot/efi/EFI/redhat/grub.cfg and it seems to work now!
ls (hd0,gpt1)
yeah, you need slash in the end to display content:
ls (hd0,gpt1)/
boot was successful, but after login + grub2-mkconfig + reboot it would return to grub cmdline again :/
OH! yes, that is because /boot/efi/EFI/redhat/grub.cfg was removed from UEFI shell during recovery. I've updated my post to reflect that.
Thank you.
Im not sure if that is related to the original issue but the only thing that is a bit weird now is that grubby shows:
[root@ol9-machine ~]# grubby --default-kernel
/boot/vmlinuz-5.15.0-207.156.6.el9uek.x86_64
[root@ol9-machine ~]# grubby --default-index
3
[root@ol9-machine ~]# grubby --info DEFAULT
index=3
kernel="/boot/vmlinuz-5.15.0-207.156.6.el9uek.x86_64"
args="ro rd.lvm.lv=ol/root rhgb quiet crashkernel=1G-64G:448M,64G-:512M $tuned_params"
root="/dev/mapper/ol-root"
initrd="/boot/initramfs-5.15.0-207.156.6.el9uek.x86_64.img $tuned_initrd"
title="Oracle Linux Server (5.15.0-207.156.6.el9uek.x86_64 with Unbreakable Enterprise Kernel) 9.4"
id="bda9a182a36740ada28baaa218d5c09d-5.15.0-207.156.6.el9uek.x86_64"
And yet, when I reboot it would automatically select index 0 with a kernel that is no longer present in /boot. So the system is usable but wouldnt survive an automated reboot. See screenshot attached.
[root@ol9-machine ~]# uname -r
5.15.0-207.156.6.el9uek.x86_64
[root@ol9-machine ~]# dnf list installed | grep kernel
kernel.x86_64 5.14.0-427.22.1.el9_4 @ol9_baseos_latest
kernel-core.x86_64 5.14.0-427.22.1.el9_4 @ol9_baseos_latest
kernel-modules.x86_64 5.14.0-427.22.1.el9_4 @ol9_baseos_latest
kernel-modules-core.x86_64 5.14.0-427.22.1.el9_4 @ol9_baseos_latest
kernel-tools.x86_64 5.14.0-427.22.1.el9_4 @ol9_baseos_latest
kernel-tools-libs.x86_64 5.14.0-427.22.1.el9_4 @ol9_baseos_latest
kernel-uek.x86_64 5.15.0-207.156.6.el9uek @ol9_UEKR7
kernel-uek-core.x86_64 5.15.0-207.156.6.el9uek @ol9_UEKR7
kernel-uek-modules.x86_64 5.15.0-207.156.6.el9uek @ol9_UEKR7
Any help appreciated.
can you show please
for x in $(find /boot |grep grubenv); do echo $x; cat $x; done
cat /boot/efi/EFI/redhat/grub.cfg |grep grubenv
cat /boot/grub2/grub.cfg |grep grubenv
Sure, here you go:
/boot/grub2/grubenv
# GRUB Environment Block
# WARNING: Do not edit this file by tools other than grub-editenv!!!
saved_entry=bda9a182a36740ada28baaa218d5c09d-5.15.0-207.156.6.el9uek.x86_64
boot_success=1
boot_indeterminate=0
##################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################
/boot/efi/EFI/redhat/grub.cfg
if [ -f ${config_directory}/grubenv ]; then
load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
# The kernelopts variable should be defined in the grubenv file. But to ensure that menu
# without a grubenv file, define a fallback kernelopts variable if this has not been set.
# The kernelopts variable in the grubenv file can be modified using the grubby tool or by
# the kernelopts variable in the grubenv file and the fallback kernelopts variable.
/boot/grub2/grub.cfg
if [ -f ${config_directory}/grubenv ]; then
load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
# The kernelopts variable should be defined in the grubenv file. But to ensure that menu
# without a grubenv file, define a fallback kernelopts variable if this has not been set.
# The kernelopts variable in the grubenv file can be modified using the grubby tool or by
# the kernelopts variable in the grubenv file and the fallback kernelopts variable.
OK, everything above looks correct. Now ls /boot/loader/entries/
It seems you have some redundant entries there.
[root@ol9-machine grub2]# ls -al /boot/loader/entries/
total 28
drwx------. 2 root root 4096 Jun 25 13:34 .
drwxr-xr-x. 3 root root 21 Oct 17 2022 ..
-rw-r--r--. 1 root root 440 May 22 13:59 495620e0609f491080cb4e769e86283d-0-rescue.conf
-rw-r--r--. 1 root root 381 May 22 13:59 495620e0609f491080cb4e769e86283d-5.14.0-284.30.1.el9_2.x86_64.conf
-rw-r--r--. 1 root root 428 May 22 13:59 495620e0609f491080cb4e769e86283d-5.15.0-200.131.27.el9uek.x86_64.conf
-rw-r--r--. 1 root root 405 May 22 13:59 bda9a182a36740ada28baaa218d5c09d-0-rescue.conf
-rw-r--r--. 1 root root 381 Jun 25 10:18 bda9a182a36740ada28baaa218d5c09d-5.14.0-427.22.1.el9_4.x86_64.conf
-rw-r--r--. 1 root root 424 Jun 25 10:19 bda9a182a36740ada28baaa218d5c09d-5.15.0-207.156.6.el9uek.x86_64.conf
oh, heres the problem - sorry for bothering you - but thanks for pointing me in the right direction. looks like (some script or person) regenerated the machine-id a few weeks ago...
For everyone tracking this issue: grub2 updates that does NOT contain scriptlet bug and, at the same time, resolves the issue for people who had installed broken package, but did not reboot, was published to public repositories:
version is 2.06-80.0.3.el9_4
This is a different bug compared to what is described in https://github.com/oracle/oracle-linux/issues/147
When the latest updates are applied and a server is then rebooted GRUB will not start and appears to be stuck in a busy loop displaying the following message. "error: ../../grub-core/commands/efi/tpm.c:150:unknown TPM error"
Secure Boot is disabled and no previous problems.
Steps to reproduce:
Download and install OL 9.4 x86_64 OK for first boot. Apply updates Reboot and GRUB will then fail to load with the above error message.
As a cross check a fresh install was done and grub updates were excluded with
exclude=grub*
in the /etc/dnf/dnf.conf file.The non-grub updates were installed and the server rebooted OK.