google / android-emulator-hypervisor-driver

Other
786 stars 254 forks source link

BSOD SYSTEM_SERVICE_EXCEPTION gvm.sys with qemu #14

Open thesword53 opened 4 years ago

thesword53 commented 4 years ago

Host system gets BSOD when guest (Windows 7) also gets BSOD or during boot.


Systems tested:

cpu: AMD Ryzen 7 3700X host: Windows 10 Pro gest: Windows 7 Ultimate


cpu: Intel Core i7-4810MQ host: Arch Linux (KVM with nested virtualization) guest1: Windows 7 Ultimate (with gvm installed) guest2: Windows 7 Ultimate

20200601_155035

Taogle2018 commented 4 years ago

What is the hypervisor used in AMD Ryzen Win10 Pro?

Taogle2018 commented 4 years ago

AND for the Intel case, how can you run gvm on Intel with Android Emulator?

thesword53 commented 4 years ago

What is the hypervisor used in AMD Ryzen Win10 Pro?

gvm

AND for the Intel case, how can you run gvm on Intel with Android Emulator?

I didn't use Android Emulator, I used qemu with gvm acceleration: https://github.com/qemu-gvm/qemu-gvm #5 (qemu-system-x86_64 -accel gvm ...) with Windows 7 as guest.

Taogle2018 commented 4 years ago

OK. I just realized that you are not using Android Emulator. Thanks for bug report. I myself only tried Ubuntu 18.04 when using gvm as a generic solution. I tried to install Windows 10 but guest hangs. Using this as a generic hypervisor is possible but I did not have much time working on that. It is not on the project plan yet. I will still try to see if I can fix this. However, please do not set any expectation on when. :)

thesword53 commented 4 years ago

Windows 7 and Windows 10 don't work with SeaBIOS. I have to use OVMF UEFI. Your GVM hypervisor works better than WHPX on qemu, because I am not able to boot Windows 7 at all with WHPX.

Taogle2018 commented 4 years ago

Thanks for the tips. I tried UEFI and now I could install Win7 and Win10. Your information helped me a lot. Here is my result. My system: Ryzen 2700, Host Win10 2004 Pro, Guest Win7 SP1 Ultimate. I did a fresh install and Win7 booted normally. Any special operations that can triggered the BSOD?

thesword53 commented 4 years ago

Any special operations that can triggered the BSOD?

Boot Windows 7 VM and trigger BSOD on guest (kill csrss.exe process for example). Your host will also get a BSOD.

Taogle2018 commented 4 years ago

I tried but I could not reproduce. When I triggered a crash using NotMyFault from sysinternals, the guest got a crashdump and rebooted. The host is not impacted. It is weird that the BSOD screen does not show inside the guest so it will look like a hang. I am wondering if there is a way to share your crahdump with me?

thesword53 commented 4 years ago

Here is the crashdump: https://drive.google.com/file/d/1Rrh4qH_-ki1PGLU-DVvajkUsNADPN4OA/view?usp=sharing

The host is not impacted. It is weird that the BSOD screen does not show inside the guest so it will look like a hang.

You need to wait a bit and the host will crash.

Taogle2018 commented 4 years ago

Thanks for the crash dump. It does look like a "use-after-free" issue. I will come back when I find out the reason.

thesword53 commented 4 years ago

I share the memory dump (~700MB) https://drive.google.com/file/d/1qTHQy2uQyN1KzqbJ4rutel9R8m8N9uzK/view?usp=sharing. I found the stack trace with WinDBG but I don't have symbol names of gvm

STACK_TEXT:
fffff880052d9520 fffff88003b035da : fffffa80080f5000 0000000000000003 0000000000000000 fffffa8007d14aa0 : gvm+0x11007 fffff880052d9580 fffff88003b09ba3 : fffffa80080f5000 0000000000186a76 0000000000000000 000000000000008e : gvm+0xf5da fffff880052d9620 fffff88003b0538f : 000000027eeee000 0000000000000000 0000000000000001 0000000000186a76 : gvm+0x15ba3 fffff880052d9680 fffff88003b14804 : 0000000000000000 0000000000000000 0000000000000000 fffffa80080f5000 : gvm+0x1138f fffff880052d96d0 fffff88003b167a1 : 0000000000000000 0000000000000000 0000000000000081 fffffa80080f5000 : gvm+0x20804 fffff880052d9740 fffff88003b28340 : 0000000000000000 00000000fffffffb 00000000fffffffb 0000000000002c20 : gvm+0x227a1 fffff880052d9770 fffff88003b283f0 : 0000000000000000 fffffa80080f5000 fffff880052d9b60 fffffa800879fc20 : gvm+0x34340 fffff880052d97e0 fffff88003b2433f : fffffa80080f5000 0000000000000000 fffffa80080f5150 0000000000000001 : gvm+0x343f0 fffff880052d9810 fffff88003b2c43c : fffffa80080f5000 fffff880052d9b60 0000000000000000 fffffa80080f5110 : gvm+0x3033f fffff880052d9840 fffff88003b29171 : fffffa80080f9f20 fffff880052d9918 fffff880052d9968 fffff800028e704a : gvm+0x3843c fffff880052d9890 fffff80002d092b5 : fffffa8007c5b3d0 fffff88002f1e180 fffffa8007c5b490 0000000000000000 : gvm+0x35171 fffff880052d98c0 fffff80002b9b5d6 : fffff8a000009b80 0000000000000000 0000000000000000 0000000000000000 : nt!IopXxxControlFile+0x6d5 fffff880052d9a00 fffff800028f2bd3 : 0000000000000000 0000000000000000 0000000000000000 0000000008e6fb20 : nt!NtDeviceIoControlFile+0x56 fffff880052d9a70 0000000076fb98fa : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiSystemServiceCopyEnd+0x13 0000000008e6fa68 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x76fb98fa

Taogle2018 commented 4 years ago

Symbols for 1.5 can be downloaded here. FYI. https://1drv.ms/u/s!AljlID0ntVyugehHeyCgYHkiJSUAew?e=JekxoT Thanks for sharing the dump.

Taogle2018 commented 4 years ago

I think https://github.com/google/android-emulator-hypervisor-driver-for-amd-processors/issues/23 is probably the same issue, although I have not get that dump yet.

thesword53 commented 4 years ago

It seems to be caused by this instruction: https://github.com/google/android-emulator-hypervisor-driver-for-amd-processors/blob/c772caab541d0a7ede442f32c04b0c95aacba512/arch/x86/kvm/mmu.c#L2097

Taogle2018 commented 4 years ago

https://1drv.ms/u/s!AljlID0ntVyugehxayBpYN3uOnXidw?e=ZQ1cuo Can you try this build and see if it fixes the problem?

thesword53 commented 4 years ago

https://1drv.ms/u/s!AljlID0ntVyugehxayBpYN3uOnXidw?e=ZQ1cuo Can you try this build and see if it fixes the problem?

I can't boot Windows 7 guest at all with this build. The "Starting Windows" shows up and the screen become black.

Taogle2018 commented 4 years ago

OK. I will do another build for you, will be back later.

Taogle2018 commented 4 years ago

https://1drv.ms/u/s!AljlID0ntVyugehyHXoKYGgtriDJrA?e=1lAXXh Can you try this one? This build is exactly v1.5 + intended fix, removing any other irrelevant patches from the former build.

thesword53 commented 4 years ago

https://1drv.ms/u/s!AljlID0ntVyugehyHXoKYGgtriDJrA?e=1lAXXh Can you try this one? This build is exactly v1.5 + intended fix, removing any other irrelevant patches from the former build.

I have the same issue with this build. The guest seems to get a BSOD but the screen is black.

Taogle2018 commented 4 years ago

Thanks. It is hard to guess the reason as this is actually one line change, which should not alter guest behavior. Let me explore more before getting back.

Taogle2018 commented 4 years ago

I've tried to install and run a Windows 7 64 guest successfully with both builds. The commandline options are "-accel gvm -cpu host -m 8G -smp cores=8 -hda=win7.file -sdl". It is weird that these builds brought a black guest for you. So let me confirm, guest is OK when using 1.5 release but turns to black screen when switching to one of these two testing builds. If that's the case, I can build another one that is exactly the same as v1.5. This will help us to identify anything changed in my local build system. Otherwise, I really cannot think of a reason why.

thesword53 commented 4 years ago

I've tried to install and run a Windows 7 64 guest successfully with both builds. The commandline options are "-accel gvm -cpu host -m 8G -smp cores=8 -hda=win7.file -sdl". It is weird that these builds brought a black guest for you. So let me confirm, guest is OK when using 1.5 release but turns to black screen when switching to one of these two testing builds. If that's the case, I can build another one that is exactly the same as v1.5. This will help us to identify anything changed in my local build system. Otherwise, I really cannot think of a reason why.

This issue only happened if I use OVMF UEFI with Windows 7.

Taogle2018 commented 4 years ago

I also used OVMF UEFI bios. So OVMF UEFI with Windows 7 can work with gvm v1.5, but cannot work with the two builds I sent. Right?

thesword53 commented 4 years ago

I also used OVMF UEFI bios. So OVMF UEFI with Windows 7 can work with gvm v1.5, but cannot work with the two builds I sent. Right?

Yes

thesword53 commented 4 years ago

I tested GVM 1.6 and I can't boot any Windows OS.

With gvm 1.5 I was able to start Windows 7 (OVMF only) and Windows 10

I will look for Linux guests.

thesword53 commented 4 years ago

I tested Linux (Ubuntu 16.04 and Ubuntu 19.10) and it works but I have lots of hardware errors (machine check exception) on guest.

Taogle2018 commented 4 years ago

On your Intel or AMD, btw?

thesword53 commented 4 years ago

On your Intel or AMD, btw?

Intel

Taogle2018 commented 4 years ago

Perhaps I should find something a similar CPU and do a test. Are you still using nested virtualization with Arch Linux?

thesword53 commented 4 years ago

Perhaps I should find something a similar CPU and do a test. Are you still using nested virtualization with Arch Linux?

Yes I am using nested virtualization with an Intel Core i7-4810MQ (Haswell). I can't test gvm with my AMD computer now because I'm not at home.

thesword53 commented 4 years ago

I tested GVM 1.6 on AMD and it works. On Intel with OVMF, I get a BSOD on guest (system_thread_exception_not_handled) with Windows 7/8/10. I think it's related to https://github.com/google/android-emulator-hypervisor-driver-for-amd-processors/commit/4edc540fa73e9ba817f86457190d1d07f2428674.

Taogle2018 commented 4 years ago

I feel the same too, as the change may surprise the KVM. It is hard to tell whether this exposes a KVM bug as it does work natively on my Intel. But right now, I am too busy to work on this.

thesword53 commented 4 years ago

I tried to compile GVM, and the Intel issue seems to be caused by https://github.com/google/android-emulator-hypervisor-driver-for-amd-processors/commit/c2693c93b4aac6103c931d274c6fc6806a0b6ae0

Taogle2018 commented 4 years ago

If that's the case, I can safely revert the change in the next release. Thanks! Together with a fix for Windows insider build, I can release 1.7 very quickly.

On Fri, Sep 18, 2020 at 1:41 AM thesword53 notifications@github.com wrote:

I tried to compile GVM, and the Intel issue seems to be caused by c2693c9 https://github.com/google/android-emulator-hypervisor-driver-for-amd-processors/commit/c2693c93b4aac6103c931d274c6fc6806a0b6ae0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/android-emulator-hypervisor-driver-for-amd-processors/issues/14#issuecomment-694738935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2MOAN4HT7J2KOY5Q3C5JLSGMMLBANCNFSM4NS4QCBQ .

-- Haitao @Google

Taogle2018 commented 4 years ago

Hi, 1.7 is released and c2693c9 is reverted.

thesword53 commented 2 years ago

Hi Taogle2018,

I was wrong. c2693c93b4aac6103c931d274c6fc6806a0b6ae0 didn't solve the issue.

I also tested an Arch Linux VM and i got kernel panic "MCA architectural violation!" Panic occurres here: https://github.com/torvalds/linux/blob/v5.16/arch/x86/kernel/cpu/mce/core.c#L361 in ex_handler_msr_mce I think 4edc540fa73e9ba817f86457190d1d07f2428674 does something wrong with MSR in a nested VM.

I also tested GVM in a host Intel PC and Windows 7/10 and Arch Linux boot as guest.