Closed jboeing closed 8 years ago
For the record, the same setup boots fine if the only change is to use SeaBIOS instead of OVMF.
I forgot to mention that the host in question has a quad-core Intel Penryn CPU (Q9550S).
Thank you for the report, even though it makes me cry. :( :( :(
Can you please grab (or build) the most recent OVMF available to you, and bisect the host kernel under it, using the stable tree? (I realize this will be horribly annoying, so thank you for your cooperation.)
I'm requesting this because to my knowledge things have been working nicely on 4.4+ host kernels. Namely, the edk2 commit you identified is not new (it is dated Oct 19, 2015). On the other hand, the 4.4.6 stable / longterm kernel is new-ish (Mar 16,2016). I would recommend (more precisely, ask you for) bisecting 4.4.0 through 4.4.6.
Thank you Laszlo
... This may be wishful thinking on my part, but it looks like 4.4.6 regressed other virt-related stuff as well; see for example this thread: http://thread.gmane.org/gmane.comp.emulators.qemu/406416
$ git log --oneline --reverse v4.4.5..v4.4.6 | grep -i kvm
c9e1bbef7e77 kvm: cap halt polling at exactly halt_poll_ns
0bbe5fa4f79a KVM: VMX: disable PEBS before a guest entry
78939530542f KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
1c463a390a89 KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit
68ed2ca153c7 KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
1ebd29d6b940 KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
I've also asked @bonzini for his insight wrt. this report. Thanks!
Sure, I'll take a shot at bisecting the host kernel.
In the meantime, I tried using the 94941c8 OVMF build with older kernels I have on the host. The VM booted with 4.1.6 on the host, but failed with 4.2.0.
Please don't bother testing pre-4.4.0 hosts. Please see https://github.com/tianocore/edk2/issues/21: hosts kernels >= 4.2 and < 4.4.0 are known to break OVMF.
... Well, actually, I find it useful that you tested 4.1.6. Your successful test suggests that your hardware is "appropriate", and that 4.4.6 has reintroduced similar issues that were fixed from 4.2.0 to 4.4.0.
I started bisecting 4.1 to 4.2 before I saw your last comment. It was pointing somewhere in the big KVM merge commit (4e24155) for 4.2-rc1 where I left off.
After looking through #21 and the linked kernel bug (https://bugzilla.kernel.org/show_bug.cgi?id=107561), I tested 4.4.0 (and 4.4.3). I still hit the same failure, so I think I'm seeing something that was missed in the 4.3/4.4 fixes rather than a regression in the 4.4 stable series.
Thanks for the update.
@bonzini said in https://github.com/tianocore/edk2/issues/21#issuecomment-174916085,
Even with 4.4 there are some changes compared to 4.1 that may cause this bug. 4.4 only fixed the bugs, it didn't revert everything.
That seems to be consistent with your results, yes? 4.1.6 is the last one that works for you, and none of 4.4.0, 4.4.3, and 4.4.6 do. Right?
That's right.
Finishing the kernel bisect points to commit d28bc9d: KVM: x86: INIT and reset sequences are different
as where my failure starts. Commit 5690891: kvm: x86: zero EFER on INIT
undid a small part of it, but otherwise it seems to be intact on the latest kernel tip.
So...would you suggest reporting a bug against KVM, or wait for @bonzini to chime in?
Thank you very much for completing the bisection.
I have emailed Paolo directly about this item, but I think he might need two weeks (possibly more) to get to it. Normally I would recommend to just wait until he responds here, but given the time frame (and your precise results with the bisection), I recommend opening a kernel bugzilla.
(In issue #21 too, a kernel BZ got referenced (107561), so this shouldn't count as unusual.)
Once you have a BZ filed in http://bugzilla.kernel.org/, could you please paste the link here? Thank you.
BTW, my colleague Radim has recently become co-maintainer for KVM, so I'm going to ask him to review this item as well. (I'm unsure if Radim has a github ID, which is why the direct email + a kernel BZ might be best.)
Thanks!
@jboeing, Radim looked at your report, and said
He has Penryn, which is one of first CPUs that shipped with VMX ... Have you tried running without kvm_intel.ept and/or other parameters?
Which immediately caused me to snap my forehead -- I have already determined that you need EPT for this stuff to work. Please refer to the following message:
http://thread.gmane.org/gmane.comp.bios.edk2.devel/9268/focus=9406
The original topic of said thread is different / independent, but the symptoms experienced there, and the cause (lack of EPT) should be identical. Can you please verify if you have EPT support in your host CPU?
... It does remain an interesting question why after KVM commit d28bc9d, EPT became a hard requirement for running OVMF. Anyway, I'll happily defer that question to our kernel developers. @jboeing, after you please file a kernel BZ, and confirm that your phys machine does lack EPT, I believe I'd like to close this OVMF report. Thanks!
Penryn doesn't have EPT; that was added for Nehalem.
$ cat /proc/cpuinfo
...
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts
rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm
sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
$ tail /sys/module/kvm_intel/parameters/*
==> /sys/module/kvm_intel/parameters/emulate_invalid_guest_state <==
Y
==> /sys/module/kvm_intel/parameters/enable_apicv <==
N
==> /sys/module/kvm_intel/parameters/enable_shadow_vmcs <==
N
==> /sys/module/kvm_intel/parameters/ept <==
N
==> /sys/module/kvm_intel/parameters/eptad <==
N
==> /sys/module/kvm_intel/parameters/fasteoi <==
Y
==> /sys/module/kvm_intel/parameters/flexpriority <==
Y
==> /sys/module/kvm_intel/parameters/nested <==
N
==> /sys/module/kvm_intel/parameters/ple_gap <==
0
==> /sys/module/kvm_intel/parameters/ple_window <==
4096
==> /sys/module/kvm_intel/parameters/ple_window_grow <==
2
==> /sys/module/kvm_intel/parameters/ple_window_max <==
1073741823
==> /sys/module/kvm_intel/parameters/ple_window_shrink <==
0
==> /sys/module/kvm_intel/parameters/pml <==
N
==> /sys/module/kvm_intel/parameters/unrestricted_guest <==
N
==> /sys/module/kvm_intel/parameters/vmm_exclusive <==
Y
==> /sys/module/kvm_intel/parameters/vpid <==
N
I'm working on the kernel BZ now.
Thank you. I've CC'd @rkrcmar (and myself) on that bug, and I'm closing this one.
This (closed) item has been manually migrated to https://tianocore.acgmultimedia.com/show_bug.cgi?id=83
If I enable SMP in my VM, QEMU either hangs at boot or dies with the error message:
KVM: entry failed, hardware error 0x80000021
I git-bisected the failure to commit 94941c8: UefiCpuPkg: CpuDxe: broadcast MTRR changes to APs
The host is running Linux kernel 4.4.6 and QEMU 2.5.0. The host is using GCC 5.3.0; I get the same failure whether I build OVMF with GCC 4.9.3 or 5.3.0. A minimal QEMU command line to repro is:
qemu-system-x86_64 -enable-kvm -smp cpus=2 -drive if=pflash,format=raw,file=OVMF.fd
Here's the output from the debug build of OVMF:
If booting hangs rather than hits the KVM error, the debug output is the same as above except it cuts off after the "Does not find any HOB stored CPU BIST information!" message.