ionescu007 / SimpleVisor

SimpleVisor is a simple, portable, Intel VT-x hypervisor with two specific goals: using the least amount of assembly code (10 lines), and having the smallest amount of VMX-related code to support dynamic hyperjacking and unhyperjacking (that is, virtualizing the host state from within the host). It works on Windows and UEFI.
http://ionescu007.github.io/SimpleVisor/
1.69k stars 259 forks source link

SimpleVisor: Booting OS #23

Closed jocmer closed 4 years ago

jocmer commented 6 years ago

Hello, i read that Simplevisor doesn't support the booting of Operating Systems right now. Could you explain what problems could occur? Right now i am experimenting with SimpleVisor. I am loading the hypervisor (as Unrestricted Guest) in UEFI. But as soon as the kernel is running it crashes.

EDIT: it might be a memory problem, because it crashes randomly. Sometimes right at the start or at login screen.

ionescu007 commented 6 years ago

Hi,

It's basically untested, and guaranteed not to work with multiple processors. But even with one processor, because Windows will be configuring the APIC, potentially sending the SIPI signal, etc, all these are things that SimpleVisor does not currently handle.

rianquinn commented 6 years ago

@ionescu007 we are adding support for this right now in Bareflank. We have UEFI working, and we have multi-core working, the only part that is missing is the SIPI/INIT process which is well documented here

ionescu007 commented 6 years ago

How do you have UEFI multi-core working? When I tried it, it crashed as soon as the MP threads returned, and I was specifically told that UEFI-MP is not supported other than for one-shot work, because UEFI will immediately turn off the core once the function returns.

Best regards, Alex Ionescu

On Mon, Oct 2, 2017 at 9:14 AM, Rian Quinn notifications@github.com wrote:

@ionescu007 https://github.com/ionescu007 we are adding support for this right now in Bareflank http://bareflank.github.io/hypervisor/. We have UEFI working, and we have multi-core working, the only part that is missing is the SIPI/INIT process which is well documented here https://github.com/01org/ikgt-core/blob/master/core/vmexit/vmexit_sipi.c

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-333583995, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxIeGc-joFLss4A4O-FE9zqRH4tfhA_ks5soQvrgaJpZM4OUvkn .

rianquinn commented 6 years ago

Here is the patch to Bareflank to get UEFI working: https://github.com/Bareflank/bfdriver/pull/3

It's not complete yet as it doesn't have the SIPI/INIT emulation that is needed to start Windows or Linux, but the hypervisor starts up on all cores without issue. It was a couple of months ago when we wrote this, but IIRC, the two main issues were making sure that VMX was enabled (which requires trapping on mods to CR4), and making sure the TSS was setup on each core since UEFI doesn't do that for you. Once all of that was handled properly, it worked fine.

Note that this patch only have the driver mods, the mods to the hypervisor to trap on CR4 are not included

ionescu007 commented 6 years ago

Hey Ryan,

Interesting, maybe the functions I used manually disabled the AP. I'll try again based on this patch, and if it works, I'll make sure to credit your work.

Best regards, Alex Ionescu

On Wed, Oct 4, 2017 at 6:48 AM, Rian Quinn notifications@github.com wrote:

Here is the patch to Bareflank to get UEFI working: https://github.com/Bareflank/bfdriver/pull/3/files

It's not complete yet as it doesn't have the SIPI/INIT emulation that is needed to start Windows or Linux, but the hypervisor starts up on all cores without issue. It was a couple of months ago when we wrote this, but IIRC, the two main issues were making sure that VMX was enabled (which requires trapping on mods to CR4), and making sure the TSS was setup on each core since UEFI doesn't do that for you. Once all of that was handled properly, it worked fine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-334160746, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxIeM2GssOP_F99adeKMHPssckdKNvQks5so4yZgaJpZM4OUvkn .

rianquinn commented 6 years ago

Let me know how it goes. We will be upstreaming our patches to Bareflank over the next several weeks. Once it is completely done, I will let you know so that if your still having issues, you can check out all of our changes

ainfosec-henselb commented 6 years ago

I wanted to verify my result before responding, and I got around to doing so and tested stopping the hypervisor as well to ensure proper operation. It seems to be working without error even after the firmware suspends cores the hypervisor is resident on. (Tested on QEMU/OVMF)

The crux of the problem was that EFI's SwitchBSP function (used here to be compatible with Bareflank's common api) would swap cr4 with the core the BSP was moving to. As the VMXE bit was not set on the core BSP was moving to, the bit would become unset on the previous (now hypervisor-resident) core during the swap. This would cause an unhandled general protection fault.

Not sure if the problem you're encountering is the same, but in any case I hope this helps.

Edit: on second look, the method I used in Driver.c isn't doing the trick (derp - I had to wrap this up too quickly) and instead a modification I made to OVMF while debugging would simply turn on VMXE when swapping. The problem should be the same in nature though.

ionescu007 commented 6 years ago

Aha, that makes a lot of sense. Yes, a GPF after swap is what I encountered, and I did not get a chance to debug this as deeply as you have. Additionally

So I can see a few ways to fix this:

0) Hypervise the BSP (AP 0) first, making sure VMXE is on in VMCS' CR4. With Core 0 under VT, switch to each AP, and manually make sure the VMCS for each AP has VMXE on. 1) Manually turn on VMXE in CR4 on each AP first, then run the normal Hypervisor-loop on each core, which should now have the correct CR4.

Or what did you end up doing? Not sure why a CR4 exit handler is required, unless I didn't fully understand the issue (or maybe I just came up with different solutions in this email..).

I'm guessing for QEMU to support VT you had to run it under KVM, correct? My test environment is VMWare on Windows, but that shouldn't change anything.

I also was not building a TSS, so I'm actually surprised that I managed to enter VT mode on all of my cores (it was switching out that failed).

Best regards, Alex Ionescu

On Tue, Oct 10, 2017 at 4:19 PM, ainfosec-henselb notifications@github.com wrote:

I wanted to verify my result before responding, and I got around to doing so and tested stopping the hypervisor as well to ensure proper operation. It seems to be working without error even after the firmware suspends cores the hypervisor is resident on. (Tested on QEMU/OVMF)

The crux of the problem was that EFI's SwitchBSP function (used here to be compatible with Bareflank's common api) would swap cr4 with the core the BSP was moving to. As the VMXE bit was not set on the core BSP was moving to, the bit would become unset on the previous (now hypervisor-resident) core during the swap. This would cause an unhandled general protection fault.

Not sure if the problem you're encountering is the same, but in any case I hope this helps.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-335635758, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxIeN2MsPS2Eaz56HKAqebP-RpkpIEAks5sq_tmgaJpZM4OUvkn .

ainfosec-henselb commented 6 years ago

On second look, the method I used in Driver.c isn't doing the trick (derp - I had to wrap this up too quickly) and instead a modification I made to OVMF while debugging would simply turn on VMXE when swapping. The problem should be the same in nature though.

You can't use SwitchBSP to effectively change other core's VMXE, as I so naively attempted, because the setting just follows you as you change cores. You'd have to use a different MP function to do so, or perhaps the other method you describe. I do think the cleanest solution would be a CR4 exit handler, as that could handle whatever the system attempts to do to CR4 regardless (but does this have performance implications?).

I am testing with kvm in nested mode.

rianquinn commented 6 years ago

Just depends on how you want to handle CR4. We have not really seen an impact since CR4 doesn't get changed all that often. Also.... all hypervisors really should trap on mods to CR4 and at a minimum, mask off the VMXE bit so that the OS cannot accidentally turn it off.

Just my two cents. Bareflank itself doesn't handle CR4 yet, but it will to support UEFI, and to make sure that VMXE cannot be disabled. I know that we saw an issue with Linux with new kernels because they added a shadow of CR4 that was disabling VMXE from the shadow (Xen and KVM saw the same problem), but this could be fixed with the same CR4 exit handler to prevent that bit from being flipped

rianquinn commented 6 years ago

As a side note, we also had to add a shadow of the GDT when stopping the hypervisor because Linux now marks the GDT as read-only, which prevents flipping the TSS busy bit.

ionescu007 commented 6 years ago

Stupid question why were you flipping the TSS busy bit?

On Tue, Oct 10, 2017 at 4:48 PM Rian Quinn notifications@github.com wrote:

As a side note, we also had to add a shadow of the GDT when stopping the hypervisor because Linux now marks the GDT as read-only, which prevents flipping the TSS busy bit.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-335640299, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxIeD3qDaJ85YxDrnx9VpDHynK9bG4_ks5srAIwgaJpZM4OUvkn .

-- Best regards, Alex Ionescu

rianquinn commented 6 years ago

When stopping the hypervisor, you are "promoting" ring 0 to ring -1. Since our VMM has it's own GDT, there are two TSS's, one for Windows/Linux and one for the VMM. Both are marked busy because both are being used. The problem is you cannot load a TSS that is marked busy so we have to flip that bit to make it work.

ionescu007 commented 6 years ago

AH I’m reusing the host TSS and GDT so yeah that makes sense.

On Tue, Oct 10, 2017 at 5:10 PM Rian Quinn notifications@github.com wrote:

When stopping the hypervisor, you are "promoting" ring 0 to ring -1. Since our VMM has it's own GDT, there are two TSS's, one for Windows/Linux and one for the VMM. Both are marked busy because both are being used. The problem is you cannot load a TSS that is marked busy so we have to flip that bit to make it work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-335643548, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxIeI2Tc5ZHGBSilfDdytM0_jw4Kb_7ks5srAdagaJpZM4OUvkn .

-- Best regards, Alex Ionescu

rianquinn commented 6 years ago

Yeah, Bareflank has it's own Page Tables, IDT, GDT, control registers, etc... Version 1.0 was like MoRE, SimpleVisor, etc... that used the same resources for both the VMM and the Host OS, but with 1.1 we wanted to move to separate resources which simplified things like memory mapping, and allowed for an easier method for loading via userspace applications

ionescu007 commented 6 years ago

Looks like I do setup a TSS for each processor already, so I think the VMXE might be the issue. However it seems the link you gave me no longer works. Where can I see the pull request now @rianquinn

rianquinn commented 6 years ago

@ainfosec-henselb Can you point @ionescu007 to what you have so far for UEFI?

ainfosec-henselb commented 6 years ago

@ionescu007 It's a WIP so message me on gitter and we can work it out


From: Rian Quinn notifications@github.com Sent: Monday, November 6, 2017 7:58:13 AM To: ionescu007/SimpleVisor Cc: Bradley Hensel; Mention Subject: Re: [ionescu007/SimpleVisor] SimpleVisor: Booting OS (#23)

@ainfosec-henselbhttps://github.com/ainfosec-henselb Can you point @ionescu007https://github.com/ionescu007 to what you have so far for UEFI?

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-342173682, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APrEpShinvhqsfjxJ-HagDQmpF-4KLX4ks5szx6EgaJpZM4OUvkn.

ionescu007 commented 6 years ago

Well this ended up to be quite the wild goose chase.

Indeed, when using SwitchBSP (which was not my preferred implementation), I was seeing really weird hangs. I had actually manually verified that CR4 wasn't being messed around with, but I guess by the time I had my debug prints, the bug already happened. Once I finally gave up and gave myself a CR Exit handler, and shoved back the right value, I got SwitchBSP to work... for CPU 1, and back to 0. As soon as I tried 2 and 3, it died on me again.

I got really annoyed and went back to StartupAllAPs. Boom. Everything worked.

Turns out I fixed a pretty silly mistake in my allocator/deallocator functions in the meantime (pages vs bytes). Now I have SimpleVisor loaded on both my Surface Pro 4 as well as my Asus Kaby Lake desktop. All it took was changing/fixing 2 lines of code. No CR handlers, shadow masks or anything of the sort.

Best regards, Alex Ionescu

On Mon, Nov 6, 2017 at 3:55 PM, ainfosec-henselb notifications@github.com wrote:

@ionescu007 It's a WIP so message me on gitter and we can work it out


From: Rian Quinn notifications@github.com Sent: Monday, November 6, 2017 7:58:13 AM To: ionescu007/SimpleVisor Cc: Bradley Hensel; Mention Subject: Re: [ionescu007/SimpleVisor] SimpleVisor: Booting OS (#23)

@ainfosec-henselbhttps://github.com/ainfosec-henselb Can you point @ionescu007https://github.com/ionescu007 to what you have so far for UEFI?

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ ionescu007/SimpleVisor/issues/23#issuecomment-342173682, or mute the threadhttps://github.com/notifications/unsubscribe- auth/APrEpShinvhqsfjxJ-HagDQmpF-4KLX4ks5szx6EgaJpZM4OUvkn.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-342328215, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxIeOjwCRdtiMqj3BIXFyumZ7nm1Gszks5sz5xngaJpZM4OUvkn .

rianquinn commented 6 years ago

Awesome, did that allow you to boot into Windows? Or does it just get SimpleVisor to run on all of the cores?

ionescu007 commented 6 years ago

The latter. The former would require thousands of lines of code as simpelrvisor is hyperjacking the UEFI environment and would have to update the host VMCS state as windows is booting up in order to correctly hyperjack the new environment being brought up.

On Tue, Nov 7, 2017 at 12:53 PM Rian Quinn notifications@github.com wrote:

Awesome, did that allow you to boot into Windows? Or does it just get SimpleVisor to run on all of the cores?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionescu007/SimpleVisor/issues/23#issuecomment-342618431, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxIeGx1RDSBvpwGBDJPSlCSuUx_WAC9ks5s0MMsgaJpZM4OUvkn .

-- Best regards, Alex Ionescu

rianquinn commented 5 years ago

We have UEFI working completely with Windows and Linux. There are a couple of things that were needed, but in general, I think that SimpleVisor could support this easily with what is already supported. We also hyperjack EFI, and the resources that you are taking EFI reserves, so there should be no additional work safely boot Windows.

ionescu007 commented 4 years ago

Check out my friend @tandasat's UEFI hypervisor which now fulfills this need:

https://github.com/tandasat/MiniVisorPkg