sonic-net / SONiC

Landing page for Software for Open Networking in the Cloud (SONiC) - https://sonic-net.github.io/SONiC/
2.23k stars 1.12k forks source link

Kexec hangs after warm-reboot in UEFI systems. #548

Open tzack000 opened 4 years ago

tzack000 commented 4 years ago

The system was running SONiC/201811 for warm-reboot, after executed "sudo warm-reboot -v", I got this error, and the system hangs.

[ 4078.124242] kexec_core: Starting new kernel
[    0.000000] ACPI BIOS Error (bug): A valid RSDP was not found (20160831/tbxfroot-244)
[    0.117519] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[    1.439934] ERROR: Unable to locate IOAPIC for GSI 51
[    1.445798] ERROR: Unable to locate IOAPIC for GSI 51
[    1.451590] ERROR: Unable to locate IOAPIC for GSI 58
[    1.457389] ERROR: Unable to locate IOAPIC for GSI 58
[    1.463172] ERROR: Unable to locate IOAPIC for GSI 67
[    1.468971] ERROR: Unable to locate IOAPIC for GSI 67
[    1.474746] ERROR: Unable to locate IOAPIC for GSI 67
[    1.480533] ERROR: Unable to locate IOAPIC for GSI 67
[    2.618976] ERROR: Unable to locate IOAPIC for GSI 67
[    2.625377] ERROR: Unable to locate IOAPIC for GSI 58
[    2.946985] ERROR: Unable to locate IOAPIC for GSI 61
[    2.952638] ERROR: Unable to locate IOAPIC for GSI 61
[   10.050942] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
[   10.058349] xhci_hcd 0000:00:14.0: Max number of devices this xHCI host supports is 32.
[   10.067308] usb usb2-port4: couldn't allocate usb_device
[   19.018937] usb 1-1: device not accepting address 2, error -110
[   35.146934] usb 1-1: device not accepting address 3, error -110
[   46.410933] usb 1-1: device not accepting address 4, error -110
[   57.162939] usb 1-1: device not accepting address 5, error -110
[   57.169593] usb usb1-port1: unable to enumerate USB device

I did some research, and it seems that we need to pass acpi_rsdp parameter to kexec in UEFI system(https://github.com/coreos/bugs/issues/167). Not sure if this is a known issue.

tzack000 commented 4 years ago

It works well now, but I'm not sure if this is the best way:

diff --git a/scripts/fast-reboot b/scripts/fast-reboot
index 0555121..ec371e0 100755
--- a/scripts/fast-reboot
+++ b/scripts/fast-reboot
@@ -202,7 +202,8 @@ if grep -q aboot_platform= /host/machine.conf; then
 elif grep -q onie_platform= /host/machine.conf; then
     KERNEL_OPTIONS=$(cat /host/grub/grub.cfg | sed "/$NEXT_SONIC_IMAGE'/,/}/"'!'"g" | grep linux)
     KERNEL_IMAGE="/host$(echo $KERNEL_OPTIONS | cut -d ' ' -f 2)"
-    BOOT_OPTIONS="$(echo $KERNEL_OPTIONS | sed -e 's/\s*linux\s*/BOOT_IMAGE=/') SONIC_BOOT_TYPE=${BOOT_TYPE_ARG}"
+    ACPI_RSDP=$(grep -m1 ^ACPI /sys/firmware/efi/systab | cut -f2- -d=)
+    BOOT_OPTIONS="$(echo $KERNEL_OPTIONS | sed -e 's/\s*linux\s*/BOOT_IMAGE=/') acpi_rsdp=${ACPI_RSDP} SONIC_BOOT_TYPE=${BOOT_TYPE_ARG}"
 else
     echo "Unknown bootloader. ${REBOOT_TYPE} is not supported."
     exit 1
lguohan commented 4 years ago

looks like a good fix. can you adapt the PR based on the coreos commit? https://github.com/coreos/bootengine/pull/38/files