Closed subvert0r closed 3 months ago
To find out the cause, I would enable the GDB stub of VMware Workstation and try to see where the processors are sitting, as well as logging all VM-exits and trying to guess what VM-exit went wrong (directly or indirectly).
Though, the symptom is similar to the issue I know of. Try zero-clearing EAX on CPUID 0x4000_0001 like this: https://github.com/tandasat/barevisor/blob/830492a48ac9825b84774b4ac237123a02872d6c/src/hvcore/src/hypervisor/host.rs#L93-L101
This prevents the NTOS from using the Hypervisor Top Level Functional Specification (TLFS) interface. This interface is known to cause issues with hypervisors that do not implement it on VMware. This may not help, but worth a try.
To find out the cause, I would enable the GDB stub of VMware Workstation and try to see where the processors are sitting, as well as logging all VM-exits and trying to guess what VM-exit went wrong (directly or indirectly).
Though, the symptom is similar to the issue I know of. Try zero-clearing EAX on CPUID 0x4000_0001 like this: https://github.com/tandasat/barevisor/blob/830492a48ac9825b84774b4ac237123a02872d6c/src/hvcore/src/hypervisor/host.rs#L93-L101
This prevents the NTOS from using the Hypervisor Top Level Functional Specification (TLFS) interface. This interface is known to cause issues with hypervisors that do not implement it on VMware. This may not help, but worth a try.
Is the following change correct (added the CPUID_HV_INTERFACE case)?
static
VOID
HandleCpuid (
_Inout_ GUEST_CONTEXT* GuestContext
)
{
int registers[4];
int leaf, subLeaf;
//
// Execute the same instruction on behalf of the guest.
//
leaf = (int)GuestContext->StackBasedRegisters->Rax;
subLeaf = (int)GuestContext->StackBasedRegisters->Rcx;
__cpuidex(registers, leaf, subLeaf);
//
// Then, modify results when necessary.
//
switch (leaf)
{
case CPUID_VERSION_INFORMATION:
//
// Do not indicate the VMX feature is available on this processor to
// prevent other hypervisor tries to use it, as MiniVisor does not
// support nesting the hypervisor.
//
ClearFlag(registers[2], CPUID_FEATURE_INFORMATION_ECX_VIRTUAL_MACHINE_EXTENSIONS_FLAG);
break;
case CPUID_HV_VENDOR_AND_MAX_FUNCTIONS:
//
// Return a maximum supported hypervisor CPUID leaf range and a vendor
// ID signature as required by the spec.
//
registers[0] = CPUID_HV_MAX;
registers[1] = 'iniM'; // "MiniVisor "
registers[2] = 'osiV';
registers[3] = ' r';
break;
//Added this
case CPUID_HV_INTERFACE:
// Return non "Hv#1" into EAX. This indicate that our hypervisor does NOT
// conform to the Microsoft hypervisor interface. This prevents the guest
// from using the interface for optimum performance, but simplifies
// implementation of our hypervisor. This is required only when testing
// in the virtualization platform that supports the Microsoft hypervisor
// interface, such as VMware, and not required for a baremetal.
// See: Hypervisor Top Level Functional Specification
registers[0] = 0;
break;
default:
break;
}
//
// Update guest's GPRs with results.
//
GuestContext->StackBasedRegisters->Rax = (UINT64)registers[0];
GuestContext->StackBasedRegisters->Rbx = (UINT64)registers[1];
GuestContext->StackBasedRegisters->Rcx = (UINT64)registers[2];
GuestContext->StackBasedRegisters->Rdx = (UINT64)registers[3];
AdvanceGuestInstructionPointer(GuestContext);
}
Also what is the effect of this on the OS? What happens when ntos doesn't use this TLFS?
That looks correct. TLFS is performance optimization -- so technically the system will run less efficiently but you will never notice this. I have not seen or heard of any issues with suppressing TLFS.
That looks correct. TLFS is performance optimization -- so technically the system will run less efficiently but you will never notice this. I have not seen or heard of any issues with suppressing TLFS.
Thank you, doing this so far seems to have solved the problem, closing this issue.
As a side question (although not related to this issue at all), do you have any plan on adding AMD support on this project?
That's good to hear the issue is gone. I have pushed the change to main https://github.com/tandasat/MiniVisorPkg/commit/4ec21d6efae00e82e70b5acfa8dcdaa94f06a63c
I do not plan to add AMD support to this project. It really not worth supporting both processors in the same code base to me. You can still find comparable projects to this for AMD on my repo: https://github.com/tandasat/barevisor (UEFI and Windows) https://github.com/tandasat/HelloAmdHvPkg (UEFI) https://github.com/tandasat/SimpleSvm (Windows)
I replicate it by doing this:
In VMWare workstation, Load the minivisor.efi, load the windows boot manager (latest win 11), then after the machine boots, wait for 30-60 seconds, then try to reboot. 20% of times the VMware guest just freezes, and doesn't crash so no crash log either (either vmware crash log or windows crash dump). It just freezes when the screen goes black after you click on reboot after 2-3 seconds?
Anyone experienced something like this?
(Yes I have enabled nested virtualization)