ionescu007 / SimpleVisor

SimpleVisor is a simple, portable, Intel VT-x hypervisor with two specific goals: using the least amount of assembly code (10 lines), and having the smallest amount of VMX-related code to support dynamic hyperjacking and unhyperjacking (that is, virtualizing the host state from within the host). It works on Windows and UEFI.
http://ionescu007.github.io/SimpleVisor/
1.69k stars 259 forks source link

Enabling interrupts in VMEXIT? #36

Closed Nou4r closed 5 years ago

Nou4r commented 5 years ago

I'm trying to implement a vmcall to read memory from another process, but I get BSOD with DRIVER_IRQL_NOT_LESS_OR_EQUAL.

Arg1: 0000023170e8e050, memory referenced
Arg2: 00000000000000ff, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff800f1ef1773, address which referenced memory

It says IRQL is 0xFF, but when I check with KeGetCurrentIrql() it gives me 0(PASSIVE_LEVEL)?

The vmcall is made from the usermode app -> causes vmexit -> which executes vmcall handler.

I store in RCX the call index(VMCallFuncIndex), RDX containing a usermode pointer to a structure of data for the memory i/o request, R8 as current process(GetCurrentProcessId() currently for testing),

    case vmcall_read_memory:
    {
        /*
        KIRQL irql_lvl = KeGetCurrentIrql();
        DbgPrint("IRQL_LVL = %d", (ULONG)irql_lvl); //PASSIVE_LEVEL
        */
        DbgPrint("Attaching to PID: %d\r\n", VpState->VpRegs->R8);
        PEPROCESS local, remote;
        if (!NT_SUCCESS(PsLookupProcessByProcessId((HANDLE)VpState->VpRegs->R8, &local)))
            local = NULL;
        if (local) {
            sIOReq req;
            KAPC_STATE apc_state;
            KeStackAttachProcess(local, &apc_state);
            RtlCopyMemory(&req, (void*)VpState->VpRegs->Rdx, sizeof(sIOReq));
            KeUnstackDetachProcess(&apc_state);
            VpState->VpRegs->Rax = 0;
            DbgPrint("target pid == %d\r\n", req.remote_pid);
            DbgPrint("success\r\n");
        }
        else
            DbgPrint("local == nullptr\r\n");

The code looks correct to me, so i'm not sure what is wrong. (1 hour later) So I opened up the crash dump in windbg and the first thing I noticed is: FAILURE_ID_HASH_STRING: km:disabled_interrupt_fault_stackptr_error_hypervisor!vmxhandlevmcall Which makes me speculate: Are interrupts disabled? So I searched already opened issues on SV, and found this: https://github.com/ionescu007/SimpleVisor/issues/3

So I decided to try it myself:

switch (VMCallFuncIndex) {
    case vmcall_read_memory:
    {
        /*
        KIRQL irql_lvl = KeGetCurrentIrql();
        DbgPrint("IRQL_LVL = %d", (ULONG)irql_lvl); //PASSIVE_LEVEL
        */
        KIRQL old_irql = KeRaiseIrqlToDpcLevel();
        _enable();
        DbgPrint("Attaching to PID: %d\r\n", VpState->VpRegs->R8);
        PEPROCESS local, remote;
        if (!NT_SUCCESS(PsLookupProcessByProcessId((HANDLE)VpState->VpRegs->R8, &local)))
            local = NULL;
        if (local) {
            sIOReq req;
            KAPC_STATE apc_state;
            KeStackAttachProcess(local, &apc_state);
            RtlCopyMemory(&req, (void*)VpState->VpRegs->Rdx, sizeof(sIOReq));
            KeUnstackDetachProcess(&apc_state);
            VpState->VpRegs->Rax = 0;
            DbgPrint("target pid == %d\r\n", req.remote_pid);
            DbgPrint("success\r\n");
        }
        else
            DbgPrint("local == nullptr\r\n");
        _disable();
        KeLowerIrql(old_irql);
        return;

But I still get BSOD, however while it's still the same old DRIVER_IRQL_NOT_LESS_OR_EQUAL, this time it shows the IRQL as being 0x2.

Arg1: 0000026f1c1ee0c0, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff800a6511783, address which referenced memory

It shows the faulting IP as being:

hypervisor!VmxHandleVMCall+a3 [c:\users\yuuar\source\repos\vt-x\hypervisor\source.c @ 519]
fffff800`a6511783 0f1001          movups  xmm0,xmmword ptr [rcx]

which seems to be this line:

RtlCopyMemory(&req, (void*)VpState->VpRegs->Rdx, sizeof(sIOReq));

So i'm not sure what's going on.

rianquinn commented 5 years ago

It sounds like you are corrupting the register state when making the vmcall. I know that SimpleVisor doesn't use a lot of assembly to handle entry into the VMM so I wonder if you are assuming the ABI is being respected here.

ghost commented 5 years ago

Actually, it was due to https://github.com/tandasat/HyperPlatform/issues/3#issuecomment-230494046 Let me quote that in case anyone has the same issue:

Hi Satoshi,

Wow -- I cannot believe you were crazy enough to try page-in from VMM context! Let me explain to you why VMM context == HIGH_LEVEL :-)

  1. Is it safe for you to be context switched by the OS while in the middle of VMM mode? Of course not... So you are at least at DISPATCH_LEVEL. Is it safe for you to "wait" on an object while at VMM mode? Of course not -- you would be context switched to another thread/idle thread which would now be running as VMM Host!!!
  2. Is it safe/OK for you to receive DPCs while in the middle of VMM mode? Again, of course not. Another reason why you are at least at DISPATCH_LEVEL. Could you receive a DPC, even if you wanted to? Nope -- receiving a DPC requires an interrupt, and IF is off, so Local APIC will never deliver it
  3. Will you receive any Device Interrupts? Nope, because EFLAGS IF is off. Would you want to be interrupted in the middle of VMM mode? Also nope. So you are at least at MAX_DIRQL.
  4. Will you receive the clock interrupt? Nope (also why you hit a CLOCK WATCHDOG BSOD sometimes)... So you are at least at CLOCK_LEVEL.
  5. Will you receive IPIs? Nope, because IF is off, so LAPIC will never send them. You also probably don't want to be running IPI while inside VMM host... So you are at least at IPI_LEVEL.
  6. Technically because you are not in the middle of handling an IPI, but rather you've disabled interrupts completely, you are at IPI_LEVEL + 1, aka HIGH_LEVEL.

In other words, if you call, for example, ExAllocatePoolWithTag, and this is PAGED POOL, you can get unlucky and this will require page-in which requires blocking your thread, and now, some other thread will run in VMM host mode... Sure, you can get lucky and control will come back to you, but this is insane... If you request NON PAGED POOL, it will "appear to work"... And then in one situation, a TLB flush will be required, which sends an IPI... Which can't be delivered... And so it will hang. Etc., etc...

Hope this makes sense.

Best regards, Alex Ionescu

On Mon, Jul 4, 2016 at 9:21 PM, Satoshi Tanda notifications@github.com wrote:

Thank you for the note, Alex. I do not think I understand why you cannot call API when interrupts are disabled. The eflags.IF is cleared when VM-exit happened but the IF only affects hardware interrupt, and exceptions can still occur. I tested that even page-in was processed fine in the VMM-context if IRQL is PASSIVE_LEVEL. To my knowledge, the IRQL requirement mostly stems from if page-in can be processed--in other words, if the process can enter wait state--, and interrupts are irrelevant. I bet that I am missing something and appreciate if you could explain a bit more about why disabling interrupts is technically the same as being IRQL==HIGH_LEVEL.

So what I did to solve it was:

My solution to this problem is to attempt to queue the requests into a static array of "IO_REQUEST"[5]. It will iterate the IO_REQUEST array, looking for one with a state of 0(which indicates it's not taken), set it to 1(InterlockedCompareExchange(&IO_REQUEST[i].state,1,0)==0), fill the record in, then set it to 2(using InterlockedExchange of course), It will set guest context RAX =1 if it was able to find an empty record to fill in, or else it'll set RAX=0 if it failed. Later on, my system thread (which was created during DriverEntry) will process any IO_REQUEST set to (2), setting back the state flag to 0 upon completing the processing. This of course means it's limited to 5 requests at any time, but I plan to only have only one single threaded usermode application make these requests through vmcall so in my use case, it's a non issue. Might not be the best solution, but hey, it works.

(p.s. @rianquinn not sure what you're saying? It's late here so i'll just re-read your post tomorrow to see if I can wrap my head around it, but it seems like you're thinking of something else). I've borrowed from SimpleVisor quite heavily, in fact, I have not made a single change to shvvmxhvx64.asm so I think you're thinking of something else?