Closed shpalani closed 1 year ago
I have 2 VMs. And both crashed at the trigger.
Memory.dump and PDB files:
Dump from eBPF-VM: "\skyshare\scratch\Users\shpalan\ebpf\ebpf-vm-1"
Dump from eBPF-VM-2: "\skyshare\scratch\Users\shpalan\ebpf\ebpf-vm-2"
Here is the debugging output from Windbgx:
There is one allocation (stated in Arg4) that is not freed.
20: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading.
Arg2: ffffd6851fbf7188, name of the driver having the issue.
Arg3: ffffd68517af2670, verifier internal structure with driver information.
Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed.
Type !verifier 3 drivername.sys for info on the allocations
that were leaked that caused the bugcheck.
Debugging Details:
------------------
Page 13f9cd not present in the dump file. Type ".hh dbgerr004" for details
KEY_VALUES_STRING: 1
Key : ATC.Annotation
Value:
Key : ATC.AnnotationLegacy
Value:
Key : Analysis.CPU.mSec
Value: 1702
Key : Analysis.Elapsed.mSec
Value: 3363
Key : Analysis.IO.Other.Mb
Value: 0
Key : Analysis.IO.Read.Mb
Value: 167
Key : Analysis.IO.Write.Mb
Value: 199
Key : Analysis.Init.CPU.mSec
Value: 3155
Key : Analysis.Init.Elapsed.mSec
Value: 628619
Key : Analysis.Memory.CommitPeak.Mb
Value: 229
Key : Bugcheck.Code.KiBugCheckData
Value: 0xc4
Key : Bugcheck.Code.LegacyAPI
Value: 0xc4
Key : Example.CSharp.LastEvent.Exception.Code
Value: 0xc4
Key : Example.CSharp.LastEvent.Exception.Parameter[0]
Value: 0x62
Key : Example.CSharp.LastEvent.Exception.Parameter[1]
Value: 0xffffd6851fbf7188
Key : Example.CSharp.LastEvent.Exception.Parameter[2]
Value: 0xffffd68517af2670
Key : Example.CSharp.LastEvent.Exception.Parameter[3]
Value: 0x1
Key : Hypervisor.Enlightenments.Value
Value: 1113060
Key : Hypervisor.Enlightenments.ValueHex
Value: 10fbe4
Key : Hypervisor.Flags.AnyHypervisorPresent
Value: 1
Key : Hypervisor.Flags.ApicEnlightened
Value: 0
Key : Hypervisor.Flags.AsyncMemoryHint
Value: 0
Key : Hypervisor.Flags.CpuManager
Value: 0
Key : Hypervisor.Flags.DeprecateAutoEoi
Value: 1
Key : Hypervisor.Flags.DynamicCpuDisabled
Value: 1
Key : Hypervisor.Flags.Epf
Value: 0
Key : Hypervisor.Flags.ExtendedProcessorMasks
Value: 1
Key : Hypervisor.Flags.HardwareMbecAvailable
Value: 0
Key : Hypervisor.Flags.MaxBankNumber
Value: 0
Key : Hypervisor.Flags.MemoryZeroingControl
Value: 0
Key : Hypervisor.Flags.NoExtendedRangeFlush
Value: 0
Key : Hypervisor.Flags.NoNonArchCoreSharing
Value: 1
Key : Hypervisor.Flags.Phase0InitDone
Value: 1
Key : Hypervisor.Flags.PowerSchedulerQos
Value: 0
Key : Hypervisor.Flags.RootScheduler
Value: 0
Key : Hypervisor.Flags.SynicAvailable
Value: 1
Key : Hypervisor.Flags.UseQpcBias
Value: 0
Key : Hypervisor.Flags.Value
Value: 4722876
Key : Hypervisor.Flags.ValueHex
Value: 4810bc
Key : Hypervisor.Flags.VpAssistPage
Value: 1
Key : Hypervisor.Flags.VsmAvailable
Value: 0
Key : Hypervisor.RootFlags.Value
Value: 0
Key : Hypervisor.RootFlags.ValueHex
Value: 0
Key : SMBIOS.BIOS.ReleaseDate
Value: 12/07/2018
Key : SMBIOS.BIOS.Vendor
Value: American Megatrends Inc.
Key : SMBIOS.MemoryDevices.BitWidths
Value: 65535/65535
Key : SMBIOS.MemoryDevices.FormFactors
Value: Unknown
Key : SMBIOS.MemoryDevices.Manufacturers
Value: Microsoft
Key : SMBIOS.MemoryDevices.MemoryTypes
Value: Other
Key : SMBIOS.PhysicalMemoryArray.MemoryErrorCorrection
Value: None
Key : SMBIOS.System.SecurityStatus
Value: Other
Key : SMBIOS.System.ThermalState
Value: Other
Key : SMBIOS.System.WakeUpType
Value: Power Switch
Key : Stack.Best.Hash
Value: 15f549471b5cdcd2077480fda3aaa3677c958bbd
Key : Statistics.Blackboxes.TDR.Sections
Value: 11
Key : Statistics.Firmware.Type
Value: BIOS
Key : Statistics.LastEvent.Exception.Code
Value: 0xC4
Key : Statistics.LastEvent.Process.Image
Value: services.exe
Key : Statistics.LastEvent.Process.Modules.Count
Value: 196
Key : Statistics.LastEvent.Process.Threads.Count
Value: 9
Key : Statistics.LastEvent.Thread.APC.KernelDisabled
Value: 1
Key : Statistics.LastEvent.Thread.IRQL
Value: 2
Key : Statistics.Processes.Count
Value: 106
Key : Statistics.Processes.Max.MemoryUsage
Value: 272936960
Key : Statistics.Processes.Max.MemoryUsageImage
Value: MsMpEng.exe
Key : Statistics.Processes.Max.ProcessCount
Value: 67
Key : Statistics.Processes.Max.ProcessCountImage
Value: svchost.exe
Key : Statistics.Processors.Count
Value: 40
Key : Statistics.Processors.Id.ArchRev
Value: 0
Key : Statistics.Processors.Id.Architecture
Value: X64
Key : Statistics.Processors.Id.FMS
Value: 6,79,1
Key : Statistics.Processors.Id.PartNumber
Value:
Key : Statistics.Processors.Id.Revision
Value: 0
Key : Statistics.Processors.Id.Vendor
Value: GenuineIntel
Key : Statistics.Threads.Count
Value: 1701
Key : Timeline.OS.Boot.DeltaSec
Value: 542752
Key : Timeline.Zulu
Value: 2023-03-15T01:17:46.246Z
Key : WER.OS.Branch
Value: rs5_release_svc_prod1
Key : WER.OS.Platform
Value: Windows
Key : WER.OS.Version
Value: 10.0.17763.3650
BUGCHECK_CODE: c4
BUGCHECK_P1: 62
BUGCHECK_P2: ffffd6851fbf7188
BUGCHECK_P3: ffffd68517af2670
BUGCHECK_P4: 1
FILE_IN_CAB: memory.dmp
VIRTUAL_MACHINE: HyperV
IMAGE_NAME: netebpfext.sys
MODULE_NAME: netebpfext
FAULTING_MODULE: fffff80366350000 netebpfext
VERIFIER_DRIVER_ENTRY: dt nt!_MI_VERIFIER_DRIVER_ENTRY ffffd68517af2670
Symbol nt!_MI_VERIFIER_DRIVER_ENTRY not found.
BLACKBOXBSD: 1 (!blackboxbsd)
PROCESS_NAME: services.exe
STACK_TEXT:
fffffa0b`17901228 fffff807`4c7dfe63 : 00000000`000000c4 00000000`00000062 ffffd685`1fbf7188 ffffd685`17af2670 : nt!KeBugCheckEx [minkernel\ntos\ke\amd64\procstat.asm @ 140]
fffffa0b`17901230 fffff807`4c7e7f7e : ffffd685`17af2670 fffffa0b`17901320 ffffd685`1fbf7050 ffffd685`1e7f3550 : nt!VerifierBugCheckIfAppropriate+0xdf [minkernel\ntos\verifier\vfbugcheck.c @ 322]
fffffa0b`17901270 fffff807`4c10703e : ffffd685`17af2670 00000000`00100000 00000000`00000001 fffff803`66351000 : nt!VfPoolCheckForLeaks+0x46 [minkernel\ntos\verifier\vfpool.c @ 895]
fffffa0b`179012b0 fffff807`4c7d141a : 00000000`00023000 ffffd685`1fbf7050 fffff807`4c2bc360 fffff807`4c2bc360 : nt!VfTargetDriversRemove+0xec6ae [minkernel\ntos\verifier\vftarget.c @ 359]
fffffa0b`17901330 fffff807`4c45a720 : ffffd685`1fbf7050 fffffa0b`17901460 00000000`ffffffff 00000000`00000001 : nt!VfDriverUnloadImage+0x3e [minkernel\ntos\verifier\vfdriver.c @ 1685]
fffffa0b`17901360 fffff807`4c5bb2c0 : 00000000`0000000c 00000000`ffffffff ffff9f47`00000001 ffffc27c`01b31a80 : nt!MiUnloadSystemImage+0x3ac [minkernel\ntos\mm\sysload.c @ 6790]
fffffa0b`17901580 fffff807`4c5bb200 : 00000000`00000000 00000000`00000000 ffffd685`0000067d fffffa0b`76697244 : nt!MmUnloadSystemImage+0x20 [minkernel\ntos\mm\sysload.c @ 6211]
fffffa0b`179015b0 fffff807`4c48d250 : ffffd685`1edb94d0 00000000`00000000 fffff807`4c40c200 fffffa0b`17901620 : nt!IopDeleteDriver+0x40 [minkernel\ntos\io\iomgr\objsup.c @ 1071]
fffffa0b`17901600 fffff807`4bed77a4 : 00000000`00000000 00000000`00000000 fffffa0b`17901880 ffffd685`1edb9500 : nt!ObpRemoveObjectRoutine+0x80 [minkernel\ntos\ob\obref.c @ 3287]
fffffa0b`17901660 fffff807`4c5b5047 : ffffd685`1edb9500 fffffa0b`179019e0 fffffa0b`179019e0 00000000`c0000001 : nt!ObfDereferenceObject+0xa4 [minkernel\ntos\ob\obref.c @ 673]
fffffa0b`179016a0 fffff807`4c5b342b : ffffd685`1bce3080 00000000`0000007c 00000072`8007e701 ffffd685`1edb9500 : nt!IopUnloadDriver+0x28f [minkernel\ntos\io\iomgr\loadunld.c @ 613]
fffffa0b`179017d0 fffff807`4c07c0f5 : ffffd685`1bce3080 fffff807`4c078631 ffffffff`fffffffe fffffa0b`17901b80 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315]
fffffa0b`17901800 fffff807`4c06da20 : fffff807`4c5b5157 ffffd685`1bce3080 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320]
fffffa0b`17901998 fffff807`4c5b5157 : ffffd685`1bce3080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiServiceLinkage [minkernel\ntos\ke\amd64\trap.asm @ 3862]
fffffa0b`179019a0 fffff807`4c5b342b : ffffd685`1bce3080 00000000`00010246 fffffa0b`17901af8 00000000`00000018 : nt!IopUnloadDriver+0x39f [minkernel\ntos\io\iomgr\loadunld.c @ 426]
fffffa0b`17901ad0 fffff807`4c07c0f5 : ffffd685`1bce3080 0000023b`860b1640 fffffa0b`17901b80 ffffe68a`00000000 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315]
fffffa0b`17901b00 00007ffa`787b3fc4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320]
00000072`8007e7a8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffa`787b3fc4
DXGANALYZE_ANALYSIS_TAG_PORT_GLOBAL_INFO_STR: Hybrid_FALSE
DXGANALYZE_ANALYSIS_TAG_ADAPTER_INFO_STR: GPU0_VenId0x1414_DevId0x8d_WDDM1.3_NotActive;GPU2_VenId0x0_DevId0x0_WDDM1.3_Active_Post;
STACK_COMMAND: .cxr; .ecxr ; kb
FAILURE_BUCKET_ID: 0xc4_62_VRF_LEAKED_POOL_IMAGE_netebpfext.sys
OS_VERSION: 10.0.17763.3650
BUILDLAB_STR: rs5_release_svc_prod1
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {05111805-0c5c-c2df-b4a0-fd87e518b717}
FAILURE_ID_REPORT_LINK: https://go.microsoft.com/fwlink/?linkid=2150386&FailureSearchText=05111805-0c5c-c2df-b4a0-fd87e518b717
Followup: MachineOwner
---------
Analysis from eBPF-VM-1 memory.dmp Non-paged pool memory leak: (As per docs, The nonpaged pool consists of virtual memory addresses that are guaranteed to reside in physical memory as long as the corresponding kernel objects are allocated.) Current nonpaged pool allocations 0x32e for 0004BEA1 bytes (814 bytes leaked)
20: kd> dt fffff803`663562f6 _ebpf_ext_attach_init_rundown long netebpfext!_ebpf_ext_attach_init_rundown+0( _net_ebpf_extension_hook_client*)
20: kd> !Verifier 3 netebpfext.sys
Verify Flags Level 0x001209bb
STANDARD FLAGS:
[X] (0x00000000) Automatic Checks
[X] (0x00000001) Special pool
[X] (0x00000002) Force IRQL checking
[X] (0x00000008) Pool tracking
[X] (0x00000010) I/O verification
[X] (0x00000020) Deadlock detection
[X] (0x00000080) DMA checking
[X] (0x00000100) Security checks
[X] (0x00000800) Miscellaneous checks
[X] (0x00020000) DDI compliance checking
ADDITIONAL FLAGS:
[ ] (0x00000004) Randomized low resources simulation
[ ] (0x00000200) Force pending I/O requests
[ ] (0x00000400) IRP logging
[ ] (0x00002000) Invariant MDL checking for stack
[ ] (0x00004000) Invariant MDL checking for driver
[ ] (0x00008000) Power framework delay fuzzing
[ ] (0x00010000) Port/miniport interface checking
[ ] (0x00040000) Systematic low resources simulation
[ ] (0x00080000) DDI compliance checking (additional)
[ ] (0x00200000) NDIS/WIFI verification
[ ] (0x00800000) Kernel synchronization delay fuzzing
[ ] (0x01000000) VM switch verification
[ ] (0x02000000) Code integrity checks
RESERVED FLAGS (use of these flags is unsupported):
[X] (0x00100000) Unused or reserved flag
[X] Indicates flag is enabled
Summary of All Verifier Statistics
RaiseIrqls 0x0
AcquireSpinLocks 0x0
Synch Executions 0x0
Trims 0x247
Pool Allocations Attempted 0x119353b
Pool Allocations Succeeded 0x119353b
Pool Allocations Succeeded SpecialPool 0x119353b
Pool Allocations With NO TAG 0x0
Pool Allocations Failed 0x0
Current paged pool allocations 0x0 for 00000000 bytes
Peak paged pool allocations 0x1 for 0000004E bytes
Current nonpaged pool allocations 0x32e for 0004BEA1 bytes
Peak nonpaged pool allocations 0x34a for 0004C909 bytes
Driver Verification List
------------------------
nt!_VF_TARGET_DRIVER 0xffffd6851e7f3550: NetEbpfExt.sys (Loaded and Unloaded)
Pool Allocation Statistics: ( NonPagedPool / PagedPool )
Current Pool Allocations: ( 0x00000001 / 0x00000000 )
Current Pool Bytes: ( 0x00000058 / 0x00000000 )
Peak Pool Allocations: ( 0x0000001d / 0x00000001 )
Peak Pool Bytes: ( 0x0000091c / 0x00000046 )
Contiguous Memory Bytes: 0x00000000
Peak Contiguous Memory Bytes: 0x00000000
Pool Allocations:
Address Length Tag Caller Address
------------------ ---------- ---- ------------------
0xffffd6851d5dafa0 0x00000058 Vfwi Unable to load image \??\C:\Windows\system32\drivers\netebpfext.sys, Win32 error 0n2
0xfffff803663562fc netebpfext!_ebpf_ext_attach_init_rundown+0x2c
From the source code,
|-- _net_ebpf_extension_hook_provider_attach_client
|--- _ebpf_ext_attach_init_rundown(hook_client);
| -- IoAllocateWorkItem(_net_ebpf_ext_driver_device_object);
Deallocation: // libs\platform\user\kernel_um.cpp void _net_ebpf_extension_detach_client_completion(In DEVICE_OBJECT device_object, _Inopt void context) { ... IoFreeWorkItem(work_item); }
// libs\platform\kernel\ebpf_platform_kernel.c void ebpf_free_preemptible_work_item { ... IoFreeWorkItem(work_item->io_work_item);
}
For eBPF-VM-2 memory.dmp:
There are 3 allocations (stated in Arg4) that are not freed.
Current nonpaged pool allocations 0x228 for 000C5E99 bytes
kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading.
Arg2: ffffe708861aeea8, name of the driver having the issue.
Arg3: ffffe7088602d8a0, verifier internal structure with driver information.
Arg4: 0000000000000003, total # of (paged+nonpaged) allocations that weren't freed.
Type !verifier 3 drivername.sys for info on the allocations
that were leaked that caused the bugcheck.
kd> !Verifier 3 netebpfext.sys
Verify Flags Level 0x001209bb
STANDARD FLAGS:
[X] (0x00000000) Automatic Checks
[X] (0x00000001) Special pool
[X] (0x00000002) Force IRQL checking
[X] (0x00000008) Pool tracking
[X] (0x00000010) I/O verification
[X] (0x00000020) Deadlock detection
[X] (0x00000080) DMA checking
[X] (0x00000100) Security checks
[X] (0x00000800) Miscellaneous checks
[X] (0x00020000) DDI compliance checking
ADDITIONAL FLAGS:
[ ] (0x00000004) Randomized low resources simulation
[ ] (0x00000200) Force pending I/O requests
[ ] (0x00000400) IRP logging
[ ] (0x00002000) Invariant MDL checking for stack
[ ] (0x00004000) Invariant MDL checking for driver
[ ] (0x00008000) Power framework delay fuzzing
[ ] (0x00010000) Port/miniport interface checking
[ ] (0x00040000) Systematic low resources simulation
[ ] (0x00080000) DDI compliance checking (additional)
[ ] (0x00200000) NDIS/WIFI verification
[ ] (0x00800000) Kernel synchronization delay fuzzing
[ ] (0x01000000) VM switch verification
[ ] (0x02000000) Code integrity checks
RESERVED FLAGS (use of these flags is unsupported):
[X] (0x00100000) Unused or reserved flag
[X] Indicates flag is enabled
Summary of All Verifier Statistics
RaiseIrqls 0x0
AcquireSpinLocks 0x0
Synch Executions 0x0
Trims 0x1cc
Pool Allocations Attempted 0xddba333
Pool Allocations Succeeded 0xddba333
Pool Allocations Succeeded SpecialPool 0xddba333
Pool Allocations With NO TAG 0x0
Pool Allocations Failed 0x0
Current paged pool allocations 0x0 for 00000000 bytes
Peak paged pool allocations 0x1 for 00000046 bytes
Current nonpaged pool allocations 0x228 for 000C5E99 bytes
Peak nonpaged pool allocations 0x24d for 000C8149 bytes
Driver Verification List
------------------------
nt!_VF_TARGET_DRIVER 0xffffe70886af3130: netebpfext.sys (Loaded)
Pool Allocation Statistics: ( NonPagedPool / PagedPool )
Current Pool Allocations: ( 0x00000003 / 0x00000000 )
Current Pool Bytes: ( 0x00000108 / 0x00000000 )
Peak Pool Allocations: ( 0x0000001f / 0x00000001 )
Peak Pool Bytes: ( 0x000009cc / 0x00000046 )
Contiguous Memory Bytes: 0x00000000
Peak Contiguous Memory Bytes: 0x00000000
Pool Allocations:
Address Length Tag Caller Address
------------------ ---------- ---- ------------------
0xffffe70887ebafa0 0x00000058 Vfwi 0xfffff806556b62fc netebpfext!_ebpf_ext_attach_init_rundown+0x2c
0xffffe70887a42fa0 0x00000058 Vfwi 0xfffff806556b62fc netebpfext!_ebpf_ext_attach_init_rundown+0x2c
0xffffe70888044fa0 0x00000058 Vfwi 0xfffff806556b62fc netebpfext!_ebpf_ext_attach_init_rundown+0x2c
To understand,
0xffffe70887ebafa0 0x00000058 Vfwi 0xfffff806556b62fc netebpfext!_ebpf_ext_attach_init_rundown+0x2c
Each listing shows the
0xffffe70887ebafa0 --- address of the current allocation,
0x00000058 --- the size,
Vfwi --- the pool tag used,
0xfffff806556b62fc --- the address in the driver code where the request for an allocation was made.
netebpfext!_ebpf_ext_attach_init_rundown+0x2c --- name of the function
The tag **Vfwi** is used whenever a driver being verified by Driver Verifier makes a request to allocate a work item using [IoAllocateWorkItem](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-ioallocateworkitem).
// This can give all the 1000 entries of allocation and deallocation performed on this thread. kd> !verifier 0xffffe70887ebafa0
Further Analysis, It can be that the asynchronous completion status of callback to complete is pending and blocked in _ebpf_ext_attach_wait_for_rundown(),
_net_ebpf_extension_detach_client_completion() | --- _ebpf_ext_attach_wait_for_rundown(&hook_client->rundown); | --- IoFreeWorkItem(work_item);
kd> ln 0xfffff806556b62fc Browse module Set bu breakpoint
[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff806556b62d0) netebpfext!_ebpf_ext_attach_init_rundown+0x2c | (fffff806
556b6360) netebpfext!_ebpf_ext_attach_wait_for_rundown
static NTSTATUS
_ebpf_ext_attach_init_rundown(net_ebpf_extension_hook_client_t* hook_client)
{
NTSTATUS status = STATUS_SUCCESS;
net_ebpf_ext_hook_client_rundown_t* rundown = &hook_client->rundown;
//
// Allocate work item for client detach processing.
//
hook_client->detach_work_item = IoAllocateWorkItem(_net_ebpf_ext_driver_device_object);
...
}
void
_net_ebpf_extension_detach_client_completion(_In_ DEVICE_OBJECT* device_object, _In_opt_ void* context)
{
net_ebpf_extension_hook_client_t* hook_client = (net_ebpf_extension_hook_client_t*)context;
PIO_WORKITEM work_item;
PAGED_CODE();
NET_EBPF_EXT_LOG_ENTRY();
UNREFERENCED_PARAMETER(device_object);
ASSERT(hook_client != NULL);
_Analysis_assume_(hook_client != NULL);
work_item = hook_client->detach_work_item;
// The NMR model is async, but the only Windows run-down protection API available is a blocking API, so the
// following call will block until all using threads are complete. This should be fixed in the future.
// Issue: https://github.com/microsoft/ebpf-for-windows/issues/1854
// Wait for any in progress callbacks to complete.
_ebpf_ext_attach_wait_for_rundown(&hook_client->rundown); <<<<<< This is an asynchronous call,
IoFreeWorkItem(work_item);
// Note: This frees the provider binding context (hook_client).
NmrProviderDetachClientComplete(hook_client->nmr_binding_handle); <<< when it asynchronously completes detaching from a client module
...
}
Current progress, With latest git pull (03/20/2023),
$count = 0
$logFile = 'C:\Temp\Test.txt'
while ($count -lt 100) { echo $count sc.exe start netebpfext
netsh ebpf add program .\cgroup_sock_addr.sys
sc.exe stop netebpfext
netsh ebpf show programs 2>&1 > $logFile
$line = Get-Content $logFile -tail 1
$words = $line.split(' ')
$programId = $words[0]
echo 'Delete program id'
netsh ebpf delete program $programId
$count = $count + 1
}
2. Next step is to leave the test machine untouched (longevity), and then check.
The problem is reproduced 2 days later.
kd> !analyze -v
DRIVER_VERIFIER_DETECTED_VIOLATION (c4) A device driver attempting to corrupt the system has been caught. This is because the driver was specified in the registry as being suspect (by the administrator) and the kernel has enabled substantial checking of this driver. If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will be among the most commonly seen crashes. Arguments: Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading. Arg2: ffffd78320c981c8, name of the driver having the issue. Arg3: ffffd7831d8ea6e0, verifier internal structure with driver information. Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed. Type !verifier 3 drivername.sys for info on the allocations that were leaked that caused the bugcheck.
Unable to load image \??\C:\Windows\system32\drivers\netebpfext.sys, Win32 error 0n2
KEY_VALUES_STRING: 1
Key : Analysis.CPU.mSec
Value: 1015
Key : Analysis.Elapsed.mSec
Value: 19243
Key : Analysis.IO.Other.Mb
Value: 0
Key : Analysis.IO.Read.Mb
Value: 2
Key : Analysis.IO.Write.Mb
Value: 0
Key : Analysis.Init.CPU.mSec
Value: 890
Key : Analysis.Init.Elapsed.mSec
Value: 354511
Key : Analysis.Memory.CommitPeak.Mb
Value: 175
Key : Bugcheck.Code.KiBugCheckData
Value: 0xc4
Key : Bugcheck.Code.LegacyAPI
Value: 0xc4
Key : Failure.Bucket
Value: 0xc4_62_VRF_LEAKED_POOL_IMAGE_netebpfext.sys
Key : Failure.Hash
Value: {05111805-0c5c-c2df-b4a0-fd87e518b717}
Key : Hypervisor.Enlightenments.Value
Value: 1113060
Key : Hypervisor.Enlightenments.ValueHex
Value: 10fbe4
Key : Hypervisor.Flags.AnyHypervisorPresent
Value: 1
Key : Hypervisor.Flags.ApicEnlightened
Value: 0
Key : Hypervisor.Flags.AsyncMemoryHint
Value: 0
Key : Hypervisor.Flags.CpuManager
Value: 0
Key : Hypervisor.Flags.DeprecateAutoEoi
Value: 1
Key : Hypervisor.Flags.DynamicCpuDisabled
Value: 1
Key : Hypervisor.Flags.Epf
Value: 0
Key : Hypervisor.Flags.ExtendedProcessorMasks
Value: 1
Key : Hypervisor.Flags.HardwareMbecAvailable
Value: 0
Key : Hypervisor.Flags.MaxBankNumber
Value: 0
Key : Hypervisor.Flags.MemoryZeroingControl
Value: 0
Key : Hypervisor.Flags.NoExtendedRangeFlush
Value: 0
Key : Hypervisor.Flags.NoNonArchCoreSharing
Value: 1
Key : Hypervisor.Flags.Phase0InitDone
Value: 1
Key : Hypervisor.Flags.PowerSchedulerQos
Value: 0
Key : Hypervisor.Flags.RootScheduler
Value: 0
Key : Hypervisor.Flags.SynicAvailable
Value: 1
Key : Hypervisor.Flags.UseQpcBias
Value: 0
Key : Hypervisor.Flags.Value
Value: 4722876
Key : Hypervisor.Flags.ValueHex
Value: 4810bc
Key : Hypervisor.Flags.VpAssistPage
Value: 1
Key : Hypervisor.Flags.VsmAvailable
Value: 0
Key : Hypervisor.RootFlags.Value
Value: 0
Key : Hypervisor.RootFlags.ValueHex
Value: 0
Key : Stack.Best.Hash
Value: 15f549471b5cdcd2077480fda3aaa3677c958bbd
Key : WER.OS.Branch
Value: rs5_release_svc_prod1
Key : WER.OS.Version
Value: 10.0.17763.3650
BUGCHECK_CODE: c4
BUGCHECK_P1: 62
BUGCHECK_P2: ffffd78320c981c8
BUGCHECK_P3: ffffd7831d8ea6e0
BUGCHECK_P4: 1
FILE_IN_CAB: memory.dmp
VIRTUAL_MACHINE: HyperV
IMAGE_NAME: netebpfext.sys
MODULE_NAME: netebpfext
FAULTING_MODULE: fffff8022fda0000 netebpfext
VERIFIER_DRIVER_ENTRY: dt nt!_MI_VERIFIER_DRIVER_ENTRY ffffd7831d8ea6e0 Symbol nt!_MI_VERIFIER_DRIVER_ENTRY not found.
BLACKBOXBSD: 1 (!blackboxbsd)
PROCESS_NAME: services.exe
STACK_TEXT:
ffffcc0e1c6ee228 fffff802
2cbe4e63 : 00000000000000c4 00000000
00000062 ffffd78320c981c8 ffffd783
1d8ea6e0 : nt!KeBugCheckEx [minkernel\ntos\ke\amd64\procstat.asm @ 140]
ffffcc0e1c6ee230 fffff802
2cbecf7e : ffffd7831d8ea6e0 ffffcc0e
1c6ee320 ffffd78320c98090 ffffd783
17267590 : nt!VerifierBugCheckIfAppropriate+0xdf [minkernel\ntos\verifier\vfbugcheck.c @ 322]
ffffcc0e1c6ee270 fffff802
2c50c03e : ffffd7831d8ea6e0 00000000
00000285 0000000000000001 fffff802
2fda1000 : nt!VfPoolCheckForLeaks+0x46 [minkernel\ntos\verifier\vfpool.c @ 895]
ffffcc0e1c6ee2b0 fffff802
2cbd641a : 0000000000023000 ffffd783
20c98090 fffff8022c6c1360 fffff802
2c6c1360 : nt!VfTargetDriversRemove+0xec6ae [minkernel\ntos\verifier\vftarget.c @ 359]
ffffcc0e1c6ee330 fffff802
2c85f720 : ffffd78320c98090 ffffcc0e
1c6ee460 00000000ffffffff 00000000
00000001 : nt!VfDriverUnloadImage+0x3e [minkernel\ntos\verifier\vfdriver.c @ 1685]
ffffcc0e1c6ee360 fffff802
2c9c02c0 : 000000000000000c 00000000
ffffffff ffffd02e00000001 ffff967c
0117ed00 : nt!MiUnloadSystemImage+0x3ac [minkernel\ntos\mm\sysload.c @ 6790]
ffffcc0e1c6ee580 fffff802
2c9c0200 : 0000000000000000 00000000
00000000 000000000000067d fffff802
76697244 : nt!MmUnloadSystemImage+0x20 [minkernel\ntos\mm\sysload.c @ 6211]
ffffcc0e1c6ee5b0 fffff802
2c892250 : ffffd7831f5da950 00000000
00000000 fffff8022c811200 ffffcc0e
1c6ee620 : nt!IopDeleteDriver+0x40 [minkernel\ntos\io\iomgr\objsup.c @ 1071]
ffffcc0e1c6ee600 fffff802
2c2dc7a4 : 0000000000000000 00000000
00000000 ffffcc0e1c6ee880 ffffd783
1f5da980 : nt!ObpRemoveObjectRoutine+0x80 [minkernel\ntos\ob\obref.c @ 3287]
ffffcc0e1c6ee660 fffff802
2c9ba047 : ffffd7831f5da980 ffffcc0e
1c6ee9e0 ffffcc0e1c6ee9e0 00000000
c0000001 : nt!ObfDereferenceObject+0xa4 [minkernel\ntos\ob\obref.c @ 673]
ffffcc0e1c6ee6a0 fffff802
2c9b842b : ffffd7831d3d1080 00000000
0000007c 000000945217eb01 ffffd783
1f5da980 : nt!IopUnloadDriver+0x28f [minkernel\ntos\io\iomgr\loadunld.c @ 613]
ffffcc0e1c6ee7d0 fffff802
2c4810f5 : ffffd7831d3d1080 fffff802
2c47d631 fffffffffffffffe ffffcc0e
1c6eeb80 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315]
ffffcc0e1c6ee800 fffff802
2c472a20 : fffff8022c9ba157 ffffd783
1d3d1080 0000000000000000 00000000
00000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320]
ffffcc0e1c6ee998 fffff802
2c9ba157 : ffffd7831d3d1080 00000000
00000000 0000000000000000 00000000
00000000 : nt!KiServiceLinkage [minkernel\ntos\ke\amd64\trap.asm @ 3862]
ffffcc0e1c6ee9a0 fffff802
2c9b842b : ffffd7831d3d1080 00000000
00010246 ffffcc0e1c6eeaf8 00000000
00000018 : nt!IopUnloadDriver+0x39f [minkernel\ntos\io\iomgr\loadunld.c @ 426]
ffffcc0e1c6eead0 fffff802
2c4810f5 : ffffd7831d3d1080 0000024e
79063aa0 ffffcc0e1c6eeb80 ffffd783
00000000 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315]
ffffcc0e1c6eeb00 00007ffe
530e3fc4 : 0000000000000000 00000000
00000000 0000000000000000 00000000
00000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320]
000000945217eb48 00000000
00000000 : 0000000000000000 00000000
00000000 0000000000000000 00000000
00000000 : 0x00007ffe`530e3fc4
DXGANALYZE_ANALYSIS_TAG_PORT_GLOBAL_INFO_STR: Hybrid_FALSE
DXGANALYZE_ANALYSIS_TAG_ADAPTER_INFO_STR: GPU0_VenId0x1414_DevId0x8d_WDDM1.3_NotActive;GPU2_VenId0x0_DevId0x0_WDDM1.3_Active_Post;
STACK_COMMAND: .cxr; .ecxr ; kb
FAILURE_BUCKET_ID: 0xc4_62_VRF_LEAKED_POOL_IMAGE_netebpfext.sys
OS_VERSION: 10.0.17763.3650
BUILDLAB_STR: rs5_release_svc_prod1
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {05111805-0c5c-c2df-b4a0-fd87e518b717}
FAILURE_ID_REPORT_LINK: https://go.microsoft.com/fwlink/?linkid=2150386&FailureSearchText=05111805-0c5c-c2df-b4a0-fd87e518b717
kd> !verifier 3 netebpfext.sys
Verify Flags Level 0x001209bb
STANDARD FLAGS: [X] (0x00000000) Automatic Checks [X] (0x00000001) Special pool [X] (0x00000002) Force IRQL checking [X] (0x00000008) Pool tracking [X] (0x00000010) I/O verification [X] (0x00000020) Deadlock detection [X] (0x00000080) DMA checking [X] (0x00000100) Security checks [X] (0x00000800) Miscellaneous checks [X] (0x00020000) DDI compliance checking
ADDITIONAL FLAGS: [ ] (0x00000004) Randomized low resources simulation [ ] (0x00000200) Force pending I/O requests [ ] (0x00000400) IRP logging [ ] (0x00002000) Invariant MDL checking for stack [ ] (0x00004000) Invariant MDL checking for driver [ ] (0x00008000) Power framework delay fuzzing [ ] (0x00010000) Port/miniport interface checking [ ] (0x00040000) Systematic low resources simulation [ ] (0x00080000) DDI compliance checking (additional) [ ] (0x00200000) NDIS/WIFI verification [ ] (0x00800000) Kernel synchronization delay fuzzing [ ] (0x01000000) VM switch verification [ ] (0x02000000) Code integrity checks
RESERVED FLAGS (use of these flags is unsupported): [X] (0x00100000) Unused or reserved flag
[X] Indicates flag is enabled
Summary of All Verifier Statistics
RaiseIrqls 0x0 AcquireSpinLocks 0x0 Synch Executions 0x0 Trims 0x1494
Pool Allocations Attempted 0x202c9794 Pool Allocations Succeeded 0x202c9794 Pool Allocations Succeeded SpecialPool 0x202c9794 Pool Allocations With NO TAG 0x0 Pool Allocations Failed 0x0
Current paged pool allocations 0x0 for 00000000 bytes Peak paged pool allocations 0x1 for 00000046 bytes Current nonpaged pool allocations 0x1c0 for 000956BD bytes Peak nonpaged pool allocations 0x1db for 000971C9 bytes
nt!_VF_TARGET_DRIVER 0xffffd78317267590: netebpfext.sys (Loaded and Unloaded)
Pool Allocation Statistics: ( NonPagedPool / PagedPool )
Current Pool Allocations: ( 0x00000001 / 0x00000000 )
Current Pool Bytes: ( 0x00000058 / 0x00000000 )
Peak Pool Allocations: ( 0x0000001a / 0x00000001 )
Peak Pool Bytes: ( 0x000007f4 / 0x00000046 )
Contiguous Memory Bytes: 0x00000000
Peak Contiguous Memory Bytes: 0x00000000
Pool Allocations:
Address Length Tag Caller Address
------------------ ---------- ---- ------------------
0xffffd783208d0fa0 0x00000058 Vfwi 0xfffff8022fda62fc netebpfext!_ebpf_ext_attach_init_rundown+0x2c
kd> ln 0xfffff8022fda62fc Browse module Set bu breakpoint
[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff8022fda62d0) netebpfext!_ebpf_ext_attach_init_rundown+0x2c | (fffff802
2fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown
kd> ln 0xfffff8022fda62fc Browse module Set bu breakpoint
[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff8022fda62d0) netebpfext!_ebpf_ext_attach_init_rundown+0x2c | (fffff802
2fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown
kd>
Memory leak: 0x00000058 = 88 bytes
The address in the driver code where the request for an allocation was made:
[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff8022fda62d0) **netebpfext!_ebpf_ext_attach_init_rundown**+0x2c | (fffff802
2fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown
kd> ln fffff802`2fda6360
Browse module
Set bu breakpoint
[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 133] (fffff802`2fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown | (fffff802`2fda6390) netebpfext!_net_ebpf_extension_hook_provider_attach_client
Exact matches:
netebpfext!_ebpf_ext_attach_wait_for_rundown (struct _net_ebpf_ext_hook_client_rundown *)
kd> ln fffff802`2fda62d0
Browse module
Set bu breakpoint
[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 89] (fffff802`2fda62d0) netebpfext!_ebpf_ext_attach_init_rundown | (fffff802`2fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown
Exact matches:
netebpfext!_ebpf_ext_attach_init_rundown (struct _net_ebpf_extension_hook_client *)
Allocator source code path:
static NTSTATUS _ebpf_ext_attach_init_rundown(net_ebpf_extension_hook_client_t hook_client) { NTSTATUS status = STATUS_SUCCESS; net_ebpf_ext_hook_client_rundown_t rundown = &hook_client->rundown;
//
// Allocate work item for client detach processing.
//
hook_client->detach_work_item = **IoAllocateWorkItem(**_net_ebpf_ext_driver_device_object);
if (hook_client->detach_work_item == NULL) {
status = STATUS_INSUFFICIENT_RESOURCES;
goto Exit;
}
// Initialize the rundown and disable new references.
ExInitializeRundownProtection(&rundown->protection);
ExWaitForRundownProtectionRelease(&rundown->protection);
rundown->rundown_reinitialized = FALSE;
rundown->rundown_occurred = FALSE;
Exit: return status; }
static void _ebpf_ext_attach_wait_for_rundown(Inout net_ebpf_ext_hook_client_rundown_t* rundown) { ExWaitForRundownProtectionRelease(&rundown->protection); rundown->rundown_occurred = TRUE; }
And as explained earlier, there is only 1 call path which calls the IoFreeWorkItem:
void
_net_ebpf_extension_detach_client_completion(_In_ DEVICE_OBJECT* device_object, _In_opt_ void* context)
Read about this call https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exwaitforrundownprotectionrelease
Location of latest memory.dmp : \skyshare\scratch\Users\shpalan\ebpf\ebpf-VM-2-03242023
After leaving the system for 2 days, the driver crash is not reproducible with the fix from https://github.com/microsoft/ebpf-for-windows/pull/2190/files.
Describe the bug
Problem: The VM crashes when 'sc.exe stop netebpfext' is triggered in the test VM. netebpfext driver: A driver has forgotten to free its pool allocations prior to unloading. Loaded driver: cgroup_sock_addr.sys
Error message on the bluescreen: DRIVER_VERIFIER_DETECTED_VIOLATION (c4) A device driver attempting to corrupt the system has been caught. This is because the driver was specified in the registry as being suspect (by the administrator) and the kernel has enabled substantial checking of this driver. If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will be among the most commonly seen crashes. Arguments: Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading. Arg2: ffffd6851fbf7188, name of the driver having the issue. Arg3: ffffd68517af2670, verifier internal structure with driver information. Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed. Type !verifier 3 drivername.sys for info on the allocations that were leaked that caused the bugcheck.
OS information
Key : WER.OS.Branch Value: rs5_release_svc_prod1 Key : WER.OS.Platform Value: Windows Key : WER.OS.Version Value: 10.0.17763.3650
Steps taken to reproduce bug
netsh ebpf add program cgroup_sock_addr.sys After day(s) of inactivity in the test VM, sc.exe stop netebpfext
Expected behavior
Should not crash, and the driver should be stopped.
Actual outcome
Bluescreen with
DRIVER_VERIFIER_DETECTED_VIOLATION (c4) A device driver attempting to corrupt the system has been caught. This is because the driver was specified in the registry as being suspect (by the administrator) and the kernel has enabled substantial checking of this driver. If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will be among the most commonly seen crashes. Arguments: Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading. Arg2: ffffd6851fbf7188, name of the driver having the issue. Arg3: ffffd68517af2670, verifier internal structure with driver information. Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed. Type !verifier 3 drivername.sys for info on the allocations that were leaked that caused the bugcheck.
Additional details
No response