tandasat / MiniVisorPkg

The research UEFI hypervisor that supports booting an operating system.
https://standa-note.blogspot.com/2020/03/introduction-and-design-considerations.html
MIT License
552 stars 86 forks source link

VMWare workstation windows 11 guest freezes sometimes during shutdown/reboot? #17

Closed subvert0r closed 3 months ago

subvert0r commented 3 months ago

I replicate it by doing this:

In VMWare workstation, Load the minivisor.efi, load the windows boot manager (latest win 11), then after the machine boots, wait for 30-60 seconds, then try to reboot. 20% of times the VMware guest just freezes, and doesn't crash so no crash log either (either vmware crash log or windows crash dump). It just freezes when the screen goes black after you click on reboot after 2-3 seconds?

Anyone experienced something like this?

(Yes I have enabled nested virtualization)

tandasat commented 3 months ago

To find out the cause, I would enable the GDB stub of VMware Workstation and try to see where the processors are sitting, as well as logging all VM-exits and trying to guess what VM-exit went wrong (directly or indirectly).

Though, the symptom is similar to the issue I know of. Try zero-clearing EAX on CPUID 0x4000_0001 like this: https://github.com/tandasat/barevisor/blob/830492a48ac9825b84774b4ac237123a02872d6c/src/hvcore/src/hypervisor/host.rs#L93-L101

This prevents the NTOS from using the Hypervisor Top Level Functional Specification (TLFS) interface. This interface is known to cause issues with hypervisors that do not implement it on VMware. This may not help, but worth a try.

subvert0r commented 3 months ago

To find out the cause, I would enable the GDB stub of VMware Workstation and try to see where the processors are sitting, as well as logging all VM-exits and trying to guess what VM-exit went wrong (directly or indirectly).

Though, the symptom is similar to the issue I know of. Try zero-clearing EAX on CPUID 0x4000_0001 like this: https://github.com/tandasat/barevisor/blob/830492a48ac9825b84774b4ac237123a02872d6c/src/hvcore/src/hypervisor/host.rs#L93-L101

This prevents the NTOS from using the Hypervisor Top Level Functional Specification (TLFS) interface. This interface is known to cause issues with hypervisors that do not implement it on VMware. This may not help, but worth a try.

Is the following change correct (added the CPUID_HV_INTERFACE case)?

static
VOID
HandleCpuid (
    _Inout_ GUEST_CONTEXT* GuestContext
    )
{
    int registers[4];
    int leaf, subLeaf;

    //
    // Execute the same instruction on behalf of the guest.
    //
    leaf = (int)GuestContext->StackBasedRegisters->Rax;
    subLeaf = (int)GuestContext->StackBasedRegisters->Rcx;
    __cpuidex(registers, leaf, subLeaf);

    //
    // Then, modify results when necessary.
    //
    switch (leaf)
    {
        case CPUID_VERSION_INFORMATION:
            //
            // Do not indicate the VMX feature is available on this processor to
            // prevent other hypervisor tries to use it, as MiniVisor does not
            // support nesting the hypervisor.
            //
            ClearFlag(registers[2], CPUID_FEATURE_INFORMATION_ECX_VIRTUAL_MACHINE_EXTENSIONS_FLAG);
            break;

        case CPUID_HV_VENDOR_AND_MAX_FUNCTIONS:
            //
            // Return a maximum supported hypervisor CPUID leaf range and a vendor
            // ID signature as required by the spec.
            //
            registers[0] = CPUID_HV_MAX;
            registers[1] = 'iniM';  // "MiniVisor   "
            registers[2] = 'osiV';
            registers[3] = '   r';
            break;

       //Added this
        case CPUID_HV_INTERFACE:
            // Return non "Hv#1" into EAX. This indicate that our hypervisor does NOT
            // conform to the Microsoft hypervisor interface. This prevents the guest
            // from using the interface for optimum performance, but simplifies
            // implementation of our hypervisor. This is required only when testing
            // in the virtualization platform that supports the Microsoft hypervisor
            // interface, such as VMware, and not required for a baremetal.
            // See: Hypervisor Top Level Functional Specification
            registers[0] = 0;
            break;

        default:
            break;
    }

    //
    // Update guest's GPRs with results.
    //
    GuestContext->StackBasedRegisters->Rax = (UINT64)registers[0];
    GuestContext->StackBasedRegisters->Rbx = (UINT64)registers[1];
    GuestContext->StackBasedRegisters->Rcx = (UINT64)registers[2];
    GuestContext->StackBasedRegisters->Rdx = (UINT64)registers[3];

    AdvanceGuestInstructionPointer(GuestContext);
}

Also what is the effect of this on the OS? What happens when ntos doesn't use this TLFS?

tandasat commented 3 months ago

That looks correct. TLFS is performance optimization -- so technically the system will run less efficiently but you will never notice this. I have not seen or heard of any issues with suppressing TLFS.

subvert0r commented 3 months ago

That looks correct. TLFS is performance optimization -- so technically the system will run less efficiently but you will never notice this. I have not seen or heard of any issues with suppressing TLFS.

Thank you, doing this so far seems to have solved the problem, closing this issue.

As a side question (although not related to this issue at all), do you have any plan on adding AMD support on this project?

tandasat commented 3 months ago

That's good to hear the issue is gone. I have pushed the change to main https://github.com/tandasat/MiniVisorPkg/commit/4ec21d6efae00e82e70b5acfa8dcdaa94f06a63c

I do not plan to add AMD support to this project. It really not worth supporting both processors in the same code base to me. You can still find comparable projects to this for AMD on my repo: https://github.com/tandasat/barevisor (UEFI and Windows) https://github.com/tandasat/HelloAmdHvPkg (UEFI) https://github.com/tandasat/SimpleSvm (Windows)