jonomango / hv

Lightweight Intel VT-x Hypervisor.
MIT License
363 stars 77 forks source link

CLOCK_WATCHDOG_TIMEOUT #25

Closed Shadowairing closed 1 year ago

Shadowairing commented 1 year ago

I did not make any code changes except for ept_pd_count = 512

The code works fine on the virtual machine and one physical machine

But for another physical machine, I can't call "hv::for_each_cpu([]() {hv::test();});" from um.exe, which causes BSOD

They have the same Windows Version 20H2.

I think there might be a issue with my CPU, because it happened two times(CLOCK_WATCHDOG_TIMEOUT) when I didn't have hv on.

Could it be that hv executed some instructions that caused the problem to recur OR there're some bugs in hv? I have no idea.

jonomango commented 1 year ago

Hard to say. What's your CPU? How many cores? One way of debugging would be to load hv.sys with something like OSR Loader (not manually mapped), wait for the blue screen, then send the MEMORY.dump file that Windows spits out.

Shadowairing commented 1 year ago

My CPU is i9-11980HK with 16 logical processors, I signed and loaded the driver normally.

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 000000000000000c, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff800529bd180, The PRCB address of the hung processor.
Arg4: 0000000000000000, The index of the hung processor.

Debugging Details:
------------------

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 3452

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 18590

    Key  : Analysis.Init.CPU.mSec
    Value: 2140

    Key  : Analysis.Init.Elapsed.mSec
    Value: 27430

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 115

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Timestamp
    Value: 2019-12-06T14:06:00Z

    Key  : WER.OS.Version
    Value: 10.0.19041.1

FILE_IN_CAB:  061723-8765-01.dmp

BUGCHECK_CODE:  101

BUGCHECK_P1: c

BUGCHECK_P2: 0

BUGCHECK_P3: fffff800529bd180

BUGCHECK_P4: 0

FAULTING_PROCESSOR: 0

BLACKBOXBSD: 1 (!blackboxbsd)

BLACKBOXNTFS: 1 (!blackboxntfs)

BLACKBOXPNP: 1 (!blackboxpnp)

BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT:  1

PROCESS_NAME:  svchost.exe

STACK_TEXT:  
ffffad81`afc85c88 fffff800`56c3ad32     : 00000000`00000101 00000000`0000000c 00000000`00000000 fffff800`529bd180 : nt!KeBugCheckEx
ffffad81`afc85c90 fffff800`56a7541d     : 00000000`00000000 ffffad81`afc33180 00000000`00000246 00000000`00001329 : nt!KeAccumulateTicks+0x1c8b32
ffffad81`afc85cf0 fffff800`56a759c1     : 00000000`00001100 00000000`00000b6b ffffad81`afc33180 00000000`00000001 : nt!KiUpdateRunTime+0x5d
ffffad81`afc85d40 fffff800`56a6f833     : ffffad81`afc33180 00000000`00000000 fffff800`574319d8 00000000`00000000 : nt!KiUpdateTime+0x4a1
ffffad81`afc85e80 fffff800`56a781f2     : ffffac85`499ae7f0 ffffac85`499ae870 ffffac85`499ae800 00000000`0000000c : nt!KeClockInterruptNotify+0x2e3
ffffad81`afc85f30 fffff800`56b27f55     : 00000000`2dc56ad8 ffffd803`159235a0 ffffd803`15923650 00000000`00000000 : nt!HalpTimerClockInterrupt+0xe2
ffffad81`afc85f60 fffff800`56bf78ea     : ffffac85`499ae870 ffffd803`159235a0 00000000`00000001 00000000`00000000 : nt!KiCallInterruptServiceRoutine+0xa5
ffffad81`afc85fb0 fffff800`56bf7e57     : 00000000`0b47773d ffffad81`afc33180 00000000`00000002 ffffad81`afc36130 : nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
ffffac85`499ae7f0 fffff800`56a93680     : 00000000`00000000 00000000`00000000 00000000`00000002 ffffd803`1b810000 : nt!KiInterruptDispatchNoLockNoEtw+0x37
ffffac85`499ae980 fffff800`56a93498     : 00000000`00000000 fffff97c`00000000 ffffd803`4690d400 fffff800`56a2f08a : nt!KeFlushMultipleRangeTb+0x160
ffffac85`499aea20 fffff800`56abc25e     : ffffa800`2fa79e00 8100000f`e28a0921 00000000`00000000 00000000`00000004 : nt!MiFlushTbList+0x88
ffffac85`499aea50 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!MmSetAddressRangeModifiedEx+0x2ae

SYMBOL_NAME:  nt!KeAccumulateTicks+1c8b32

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

IMAGE_VERSION:  10.0.19041.928

STACK_COMMAND:  .cxr; .ecxr ; kb

BUCKET_ID_FUNC_OFFSET:  1c8b32

FAILURE_BUCKET_ID:  CLOCK_WATCHDOG_TIMEOUT_INVALID_CONTEXT_nt!KeAccumulateTicks

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {95498f51-33a9-903b-59e5-d236937d8ecf}

Followup:     MachineOwner
jonomango commented 1 year ago

I'm really not sure... The fact that it bluescreens occasionally even without the hypervisor loaded, and the fact that it works fine on your other physical machine, leads me to believe that it is probably something else that is the cause of this. Maybe a faulty driver? Again, I'm not sure.

Although, if we do assume that hv is causing the blue screen, then I would probably think that it's something to do with the TSC hiding code.

Shadowairing commented 1 year ago

I fixed that. I reduced the RAM and eventually it worked fine.

The computer should have less than 64GB of RAM.

I increased the RAM of the previously working physical machine to 64GB and then the same BSOD (CLOCK_WATCHDOG_TIMEOUT) happened.

jonomango commented 1 year ago

I fixed that. I reduced the RAM and eventually it worked fine.

The computer should have less than 64GB of RAM.

I increased the RAM of the previously working physical machine to 64GB and then the same BSOD (CLOCK_WATCHDOG_TIMEOUT) happened.

Ah okay. Maybe increase this?

Shadowairing commented 1 year ago

Now I increased that and the problem is solved, thank you so much.