machyve / xhyve

xhyve, a lightweight OS X virtualization solution
Other
6.44k stars 354 forks source link

Linux 4.15 kernels not booting in xhyve #144

Open kbarmen opened 6 years ago

kbarmen commented 6 years ago

Starting with Linux kernel 4.15, it no longer boots on xhyve, last working kernel I have is 4.14.19.

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
Linux version 4.15.11-gentoo (root@hyvesmurf) (gcc version 7.2.0 (Gentoo 7.2.0-r1 p1.1)) #2 PREEMPT Tue Mar 20 04:57:18 CET 2018
Command line: root=/dev/mapper/vg-root earlyprintk=serial console=ttyS0 acpi=off root=/dev/vg/root ro quiet dolvm
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000100-0x000000000009fbff] usable
BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable
bootconsole [earlyser0] enabled
ERROR: earlyprintk= earlyser already used
rdmsr to register 0x140 on vcpu 0
                                 rdmsr to register 0x64e on vcpu 0
                                                                  rdmsr to register 0x34 on vcpu 0

and here it hangs for a very long while (120 seconds, I guess), and then I get

task swapper:1 blocked for more than 120 seconds.
      Not tainted 4.15.11-gentoo #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

and this will be repeated every 120 seconds ad infinity.

Full kernel boot log, debug on, can be seen here: https://paste.pound-python.org/show/q48VFrpw38inIIA1VIca/

I am not sure how to approach this problem, whether it is linux or xhyve that is the culprit, but I would certainly enjoy some input on how to move on with this. Would be nice if someone could confirm.

Oh, and this is xhyve: 0.2.0 from homebrew.

kbarmen commented 5 years ago

I'm sorry, but I see no change when trying with xhyve built from HEAD (2b093fa4450c050d00027bb1601c0e7ad2f44955). Newer kernels still do not boot.

Owersun commented 5 years ago

Same here.

jeremyhu commented 5 years ago

Sorry, I saw the title and assumed it was the same issue that Peter fixed. I should've checked more closely. Indeed this is different.

John-K commented 5 years ago

Try removing "acpi=off" from your kernel parameters.

In #161 this was shown to be the issue when running kernel 4.18.0-10 on Ubuntu 18.10

Owersun commented 5 years ago

Nope, that doesn't help at all. "acpi=off" is always present in the command line for ubuntu xhyve runs. The problem is far away from acpi bios emulation in kernels >4.4.0 and xhyve. Problem presented here is memory corruption and kernel crash during boot process and problem referenced in #161 is hang task after kernel boot is already long complete somewhere during OS startup. Completely different issues.

John-K commented 5 years ago

@Owersun You may want to take another look, I don’t think your analysis is correct.

The detailed crash log linked earlier in this issue is very similar to the crash log in the other issue which is not after the kernel has booted - there are some timeouts when the kernel tries to access virtio_blk for the root device.

I have not had a chance to delve into the Linux kernel changes between the two Ubuntu versions, and I don’t know the source of the recommendation to use “acpi=off” with Ubuntu on xhyve, but removing it does allow Ubuntu 18.10 to boot on xhyve. Thankfully this issue is very easy to reproduce and test.

Owersun commented 5 years ago

I took several looks and also tested it. The linked issue has lines:

"Begin: Loading essential drivers ... done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. Begin: Running /scripts/local-premount ..."

which are part of initrd script.

Kernel of ubuntu 16.04 and 16.10 with xhyve are not able to mount initrd properly due to bug that issue #132 mostly is filed for, and this one partly. There is a line during boot "[ 0.410681] Initramfs unpacking failed: junk in compressed archive". Unability to mount initrd causes the kernel to fail loading virtio_blk driver that is in initrd in ubuntu 16.04 and 16.10. So this all becomes chicken-egg problem, the system cannot find a drive to boot and fails, and if you give it initrd where driver is, it fails to load it properly due to memory corruption and fails. I do have "acpi=off" as I've already said. I have no idea why #132 was closed and this one not. Maybe this was changed (driver included in the kernel) to everyones happyness in 18.04, but I, for example, need ubuntu 16.04 to compile Amlogic buildroots, that is based on 16.04.

frezbo commented 5 years ago

I also seems to be hitting the same issue trying to boot a RHEL8 ISO