microsoft / ebpf-for-windows

eBPF implementation that runs on top of Windows
MIT License
2.95k stars 240 forks source link

BugCheck netebpfext driver: DRIVER_VERIFIER_DETECTED_VIOLATION (c4) #2197

Closed shpalani closed 1 year ago

shpalani commented 1 year ago

Describe the bug

Problem: The VM crashes when 'sc.exe stop netebpfext' is triggered in the test VM. netebpfext driver: A driver has forgotten to free its pool allocations prior to unloading. Loaded driver: cgroup_sock_addr.sys

Error message on the bluescreen: DRIVER_VERIFIER_DETECTED_VIOLATION (c4) A device driver attempting to corrupt the system has been caught. This is because the driver was specified in the registry as being suspect (by the administrator) and the kernel has enabled substantial checking of this driver. If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will be among the most commonly seen crashes. Arguments: Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading. Arg2: ffffd6851fbf7188, name of the driver having the issue. Arg3: ffffd68517af2670, verifier internal structure with driver information. Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed. Type !verifier 3 drivername.sys for info on the allocations that were leaked that caused the bugcheck.

OS information

Key : WER.OS.Branch Value: rs5_release_svc_prod1 Key : WER.OS.Platform Value: Windows Key : WER.OS.Version Value: 10.0.17763.3650

Steps taken to reproduce bug

netsh ebpf add program cgroup_sock_addr.sys After day(s) of inactivity in the test VM, sc.exe stop netebpfext

Expected behavior

Should not crash, and the driver should be stopped.

Actual outcome

Bluescreen with

DRIVER_VERIFIER_DETECTED_VIOLATION (c4) A device driver attempting to corrupt the system has been caught. This is because the driver was specified in the registry as being suspect (by the administrator) and the kernel has enabled substantial checking of this driver. If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will be among the most commonly seen crashes. Arguments: Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading. Arg2: ffffd6851fbf7188, name of the driver having the issue. Arg3: ffffd68517af2670, verifier internal structure with driver information. Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed. Type !verifier 3 drivername.sys for info on the allocations that were leaked that caused the bugcheck.

Additional details

No response

shpalani commented 1 year ago

I have 2 VMs. And both crashed at the trigger.

Memory.dump and PDB files:

Dump from eBPF-VM: "\skyshare\scratch\Users\shpalan\ebpf\ebpf-vm-1"

Dump from eBPF-VM-2: "\skyshare\scratch\Users\shpalan\ebpf\ebpf-vm-2"

shpalani commented 1 year ago

Here is the debugging output from Windbgx:

There is one allocation (stated in Arg4) that is not freed.

20: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught.  This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading.
Arg2: ffffd6851fbf7188, name of the driver having the issue.
Arg3: ffffd68517af2670, verifier internal structure with driver information.
Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed.
    Type !verifier 3 drivername.sys for info on the allocations
    that were leaked that caused the bugcheck.
Debugging Details:
------------------
Page 13f9cd not present in the dump file. Type ".hh dbgerr004" for details
KEY_VALUES_STRING: 1
    Key  : ATC.Annotation
    Value: 
    Key  : ATC.AnnotationLegacy
    Value: 
    Key  : Analysis.CPU.mSec
    Value: 1702
    Key  : Analysis.Elapsed.mSec
    Value: 3363
    Key  : Analysis.IO.Other.Mb
    Value: 0
    Key  : Analysis.IO.Read.Mb
    Value: 167
    Key  : Analysis.IO.Write.Mb
    Value: 199
    Key  : Analysis.Init.CPU.mSec
    Value: 3155
    Key  : Analysis.Init.Elapsed.mSec
    Value: 628619
    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 229
    Key  : Bugcheck.Code.KiBugCheckData
    Value: 0xc4
    Key  : Bugcheck.Code.LegacyAPI
    Value: 0xc4
    Key  : Example.CSharp.LastEvent.Exception.Code
    Value: 0xc4
    Key  : Example.CSharp.LastEvent.Exception.Parameter[0]
    Value: 0x62
    Key  : Example.CSharp.LastEvent.Exception.Parameter[1]
    Value: 0xffffd6851fbf7188
    Key  : Example.CSharp.LastEvent.Exception.Parameter[2]
    Value: 0xffffd68517af2670
    Key  : Example.CSharp.LastEvent.Exception.Parameter[3]
    Value: 0x1
    Key  : Hypervisor.Enlightenments.Value
    Value: 1113060
    Key  : Hypervisor.Enlightenments.ValueHex
    Value: 10fbe4
    Key  : Hypervisor.Flags.AnyHypervisorPresent
    Value: 1
    Key  : Hypervisor.Flags.ApicEnlightened
    Value: 0
    Key  : Hypervisor.Flags.AsyncMemoryHint
    Value: 0
    Key  : Hypervisor.Flags.CpuManager
    Value: 0
    Key  : Hypervisor.Flags.DeprecateAutoEoi
    Value: 1
    Key  : Hypervisor.Flags.DynamicCpuDisabled
    Value: 1
    Key  : Hypervisor.Flags.Epf
    Value: 0
    Key  : Hypervisor.Flags.ExtendedProcessorMasks
    Value: 1
    Key  : Hypervisor.Flags.HardwareMbecAvailable
    Value: 0
    Key  : Hypervisor.Flags.MaxBankNumber
    Value: 0
    Key  : Hypervisor.Flags.MemoryZeroingControl
    Value: 0
    Key  : Hypervisor.Flags.NoExtendedRangeFlush
    Value: 0
    Key  : Hypervisor.Flags.NoNonArchCoreSharing
    Value: 1
    Key  : Hypervisor.Flags.Phase0InitDone
    Value: 1
    Key  : Hypervisor.Flags.PowerSchedulerQos
    Value: 0
    Key  : Hypervisor.Flags.RootScheduler
    Value: 0
    Key  : Hypervisor.Flags.SynicAvailable
    Value: 1
    Key  : Hypervisor.Flags.UseQpcBias
    Value: 0
    Key  : Hypervisor.Flags.Value
    Value: 4722876
    Key  : Hypervisor.Flags.ValueHex
    Value: 4810bc
    Key  : Hypervisor.Flags.VpAssistPage
    Value: 1
    Key  : Hypervisor.Flags.VsmAvailable
    Value: 0
    Key  : Hypervisor.RootFlags.Value
    Value: 0
    Key  : Hypervisor.RootFlags.ValueHex
    Value: 0
    Key  : SMBIOS.BIOS.ReleaseDate
    Value: 12/07/2018
    Key  : SMBIOS.BIOS.Vendor
    Value: American Megatrends Inc.
    Key  : SMBIOS.MemoryDevices.BitWidths
    Value: 65535/65535
    Key  : SMBIOS.MemoryDevices.FormFactors
    Value: Unknown
    Key  : SMBIOS.MemoryDevices.Manufacturers
    Value: Microsoft
    Key  : SMBIOS.MemoryDevices.MemoryTypes
    Value: Other
    Key  : SMBIOS.PhysicalMemoryArray.MemoryErrorCorrection
    Value: None
    Key  : SMBIOS.System.SecurityStatus
    Value: Other
    Key  : SMBIOS.System.ThermalState
    Value: Other
    Key  : SMBIOS.System.WakeUpType
    Value: Power Switch
    Key  : Stack.Best.Hash
    Value: 15f549471b5cdcd2077480fda3aaa3677c958bbd
    Key  : Statistics.Blackboxes.TDR.Sections
    Value: 11
    Key  : Statistics.Firmware.Type
    Value: BIOS
    Key  : Statistics.LastEvent.Exception.Code
    Value: 0xC4
    Key  : Statistics.LastEvent.Process.Image
    Value: services.exe
    Key  : Statistics.LastEvent.Process.Modules.Count
    Value: 196
    Key  : Statistics.LastEvent.Process.Threads.Count
    Value: 9
    Key  : Statistics.LastEvent.Thread.APC.KernelDisabled
    Value: 1
    Key  : Statistics.LastEvent.Thread.IRQL
    Value: 2
    Key  : Statistics.Processes.Count
    Value: 106
    Key  : Statistics.Processes.Max.MemoryUsage
    Value: 272936960
    Key  : Statistics.Processes.Max.MemoryUsageImage
    Value: MsMpEng.exe
    Key  : Statistics.Processes.Max.ProcessCount
    Value: 67
    Key  : Statistics.Processes.Max.ProcessCountImage
    Value: svchost.exe
    Key  : Statistics.Processors.Count
    Value: 40
    Key  : Statistics.Processors.Id.ArchRev
    Value: 0
    Key  : Statistics.Processors.Id.Architecture
    Value: X64
    Key  : Statistics.Processors.Id.FMS
    Value: 6,79,1
    Key  : Statistics.Processors.Id.PartNumber
    Value:     
    Key  : Statistics.Processors.Id.Revision
    Value: 0
    Key  : Statistics.Processors.Id.Vendor
    Value: GenuineIntel
    Key  : Statistics.Threads.Count
    Value: 1701
    Key  : Timeline.OS.Boot.DeltaSec
    Value: 542752
    Key  : Timeline.Zulu
    Value: 2023-03-15T01:17:46.246Z
    Key  : WER.OS.Branch
    Value: rs5_release_svc_prod1
    Key  : WER.OS.Platform
    Value: Windows
    Key  : WER.OS.Version
    Value: 10.0.17763.3650

BUGCHECK_CODE:  c4
BUGCHECK_P1: 62
BUGCHECK_P2: ffffd6851fbf7188
BUGCHECK_P3: ffffd68517af2670
BUGCHECK_P4: 1
FILE_IN_CAB:  memory.dmp
VIRTUAL_MACHINE:  HyperV
IMAGE_NAME:  netebpfext.sys
MODULE_NAME: netebpfext
FAULTING_MODULE: fffff80366350000 netebpfext
VERIFIER_DRIVER_ENTRY: dt nt!_MI_VERIFIER_DRIVER_ENTRY ffffd68517af2670
Symbol nt!_MI_VERIFIER_DRIVER_ENTRY not found.
BLACKBOXBSD: 1 (!blackboxbsd)

PROCESS_NAME:  services.exe
STACK_TEXT:  
fffffa0b`17901228 fffff807`4c7dfe63     : 00000000`000000c4 00000000`00000062 ffffd685`1fbf7188 ffffd685`17af2670 : nt!KeBugCheckEx [minkernel\ntos\ke\amd64\procstat.asm @ 140] 
fffffa0b`17901230 fffff807`4c7e7f7e     : ffffd685`17af2670 fffffa0b`17901320 ffffd685`1fbf7050 ffffd685`1e7f3550 : nt!VerifierBugCheckIfAppropriate+0xdf [minkernel\ntos\verifier\vfbugcheck.c @ 322] 
fffffa0b`17901270 fffff807`4c10703e     : ffffd685`17af2670 00000000`00100000 00000000`00000001 fffff803`66351000 : nt!VfPoolCheckForLeaks+0x46 [minkernel\ntos\verifier\vfpool.c @ 895] 
fffffa0b`179012b0 fffff807`4c7d141a     : 00000000`00023000 ffffd685`1fbf7050 fffff807`4c2bc360 fffff807`4c2bc360 : nt!VfTargetDriversRemove+0xec6ae [minkernel\ntos\verifier\vftarget.c @ 359] 
fffffa0b`17901330 fffff807`4c45a720     : ffffd685`1fbf7050 fffffa0b`17901460 00000000`ffffffff 00000000`00000001 : nt!VfDriverUnloadImage+0x3e [minkernel\ntos\verifier\vfdriver.c @ 1685] 
fffffa0b`17901360 fffff807`4c5bb2c0     : 00000000`0000000c 00000000`ffffffff ffff9f47`00000001 ffffc27c`01b31a80 : nt!MiUnloadSystemImage+0x3ac [minkernel\ntos\mm\sysload.c @ 6790] 
fffffa0b`17901580 fffff807`4c5bb200     : 00000000`00000000 00000000`00000000 ffffd685`0000067d fffffa0b`76697244 : nt!MmUnloadSystemImage+0x20 [minkernel\ntos\mm\sysload.c @ 6211] 
fffffa0b`179015b0 fffff807`4c48d250     : ffffd685`1edb94d0 00000000`00000000 fffff807`4c40c200 fffffa0b`17901620 : nt!IopDeleteDriver+0x40 [minkernel\ntos\io\iomgr\objsup.c @ 1071] 
fffffa0b`17901600 fffff807`4bed77a4     : 00000000`00000000 00000000`00000000 fffffa0b`17901880 ffffd685`1edb9500 : nt!ObpRemoveObjectRoutine+0x80 [minkernel\ntos\ob\obref.c @ 3287] 
fffffa0b`17901660 fffff807`4c5b5047     : ffffd685`1edb9500 fffffa0b`179019e0 fffffa0b`179019e0 00000000`c0000001 : nt!ObfDereferenceObject+0xa4 [minkernel\ntos\ob\obref.c @ 673] 
fffffa0b`179016a0 fffff807`4c5b342b     : ffffd685`1bce3080 00000000`0000007c 00000072`8007e701 ffffd685`1edb9500 : nt!IopUnloadDriver+0x28f [minkernel\ntos\io\iomgr\loadunld.c @ 613] 
fffffa0b`179017d0 fffff807`4c07c0f5     : ffffd685`1bce3080 fffff807`4c078631 ffffffff`fffffffe fffffa0b`17901b80 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315] 
fffffa0b`17901800 fffff807`4c06da20     : fffff807`4c5b5157 ffffd685`1bce3080 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320] 
fffffa0b`17901998 fffff807`4c5b5157     : ffffd685`1bce3080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiServiceLinkage [minkernel\ntos\ke\amd64\trap.asm @ 3862] 
fffffa0b`179019a0 fffff807`4c5b342b     : ffffd685`1bce3080 00000000`00010246 fffffa0b`17901af8 00000000`00000018 : nt!IopUnloadDriver+0x39f [minkernel\ntos\io\iomgr\loadunld.c @ 426] 
fffffa0b`17901ad0 fffff807`4c07c0f5     : ffffd685`1bce3080 0000023b`860b1640 fffffa0b`17901b80 ffffe68a`00000000 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315] 
fffffa0b`17901b00 00007ffa`787b3fc4     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320] 
00000072`8007e7a8 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffa`787b3fc4

DXGANALYZE_ANALYSIS_TAG_PORT_GLOBAL_INFO_STR:  Hybrid_FALSE
DXGANALYZE_ANALYSIS_TAG_ADAPTER_INFO_STR:  GPU0_VenId0x1414_DevId0x8d_WDDM1.3_NotActive;GPU2_VenId0x0_DevId0x0_WDDM1.3_Active_Post;
STACK_COMMAND:  .cxr; .ecxr ; kb
FAILURE_BUCKET_ID:  0xc4_62_VRF_LEAKED_POOL_IMAGE_netebpfext.sys
OS_VERSION:  10.0.17763.3650
BUILDLAB_STR:  rs5_release_svc_prod1
OSPLATFORM_TYPE:  x64
OSNAME:  Windows 10
FAILURE_ID_HASH:  {05111805-0c5c-c2df-b4a0-fd87e518b717}
FAILURE_ID_REPORT_LINK: https://go.microsoft.com/fwlink/?linkid=2150386&FailureSearchText=05111805-0c5c-c2df-b4a0-fd87e518b717
Followup:     MachineOwner
---------
shpalani commented 1 year ago

Analysis from eBPF-VM-1 memory.dmp Non-paged pool memory leak: (As per docs, The nonpaged pool consists of virtual memory addresses that are guaranteed to reside in physical memory as long as the corresponding kernel objects are allocated.) Current nonpaged pool allocations 0x32e for 0004BEA1 bytes (814 bytes leaked)

20: kd> dt fffff803`663562f6 _ebpf_ext_attach_init_rundown long netebpfext!_ebpf_ext_attach_init_rundown+0( _net_ebpf_extension_hook_client*)

20: kd> !Verifier 3 netebpfext.sys

Verify Flags Level 0x001209bb

  STANDARD FLAGS:
    [X] (0x00000000) Automatic Checks
    [X] (0x00000001) Special pool
    [X] (0x00000002) Force IRQL checking
    [X] (0x00000008) Pool tracking
    [X] (0x00000010) I/O verification
    [X] (0x00000020) Deadlock detection
    [X] (0x00000080) DMA checking
    [X] (0x00000100) Security checks
    [X] (0x00000800) Miscellaneous checks
    [X] (0x00020000) DDI compliance checking

  ADDITIONAL FLAGS:
    [ ] (0x00000004) Randomized low resources simulation
    [ ] (0x00000200) Force pending I/O requests
    [ ] (0x00000400) IRP logging
    [ ] (0x00002000) Invariant MDL checking for stack
    [ ] (0x00004000) Invariant MDL checking for driver
    [ ] (0x00008000) Power framework delay fuzzing
    [ ] (0x00010000) Port/miniport interface checking
    [ ] (0x00040000) Systematic low resources simulation
    [ ] (0x00080000) DDI compliance checking (additional)
    [ ] (0x00200000) NDIS/WIFI verification
    [ ] (0x00800000) Kernel synchronization delay fuzzing
    [ ] (0x01000000) VM switch verification
    [ ] (0x02000000) Code integrity checks

  RESERVED FLAGS (use of these flags is unsupported):
    [X] (0x00100000) Unused or reserved flag

    [X] Indicates flag is enabled

Summary of All Verifier Statistics

  RaiseIrqls           0x0
  AcquireSpinLocks     0x0
  Synch Executions     0x0
  Trims                0x247

  Pool Allocations Attempted             0x119353b
  Pool Allocations Succeeded             0x119353b
  Pool Allocations Succeeded SpecialPool 0x119353b
  Pool Allocations With NO TAG           0x0
  Pool Allocations Failed                0x0

  Current paged pool allocations         0x0 for 00000000 bytes
  Peak paged pool allocations            0x1 for 0000004E bytes
  Current nonpaged pool allocations      0x32e for 0004BEA1 bytes
  Peak nonpaged pool allocations         0x34a for 0004C909 bytes

Driver Verification List
------------------------

nt!_VF_TARGET_DRIVER 0xffffd6851e7f3550: NetEbpfExt.sys (Loaded and Unloaded)

    Pool Allocation Statistics: ( NonPagedPool / PagedPool )

      Current Pool Allocations: ( 0x00000001 / 0x00000000 )
      Current Pool Bytes:       ( 0x00000058 / 0x00000000 )
      Peak Pool Allocations:    ( 0x0000001d / 0x00000001 )
      Peak Pool Bytes:          ( 0x0000091c / 0x00000046 )
      Contiguous Memory Bytes:       0x00000000
      Peak Contiguous Memory Bytes:  0x00000000

    Pool Allocations:

      Address             Length      Tag   Caller Address    
      ------------------  ----------  ----  ------------------
      0xffffd6851d5dafa0  0x00000058  Vfwi  Unable to load image \??\C:\Windows\system32\drivers\netebpfext.sys, Win32 error 0n2
0xfffff803663562fc  netebpfext!_ebpf_ext_attach_init_rundown+0x2c
shpalani commented 1 year ago

From the source code,

|-- _net_ebpf_extension_hook_provider_attach_client
  |--- _ebpf_ext_attach_init_rundown(hook_client);
       | -- IoAllocateWorkItem(_net_ebpf_ext_driver_device_object);

Deallocation: // libs\platform\user\kernel_um.cpp void _net_ebpf_extension_detach_client_completion(In DEVICE_OBJECT device_object, _Inopt void context) { ... IoFreeWorkItem(work_item); }

// libs\platform\kernel\ebpf_platform_kernel.c void ebpf_free_preemptible_work_item { ... IoFreeWorkItem(work_item->io_work_item);

}

shpalani commented 1 year ago

For eBPF-VM-2 memory.dmp:

There are 3 allocations (stated in Arg4) that are not freed.

Current nonpaged pool allocations 0x228 for 000C5E99 bytes

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught.  This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading.
Arg2: ffffe708861aeea8, name of the driver having the issue.
Arg3: ffffe7088602d8a0, verifier internal structure with driver information.
Arg4: 0000000000000003, total # of (paged+nonpaged) allocations that weren't freed.
    Type !verifier 3 drivername.sys for info on the allocations
    that were leaked that caused the bugcheck.

kd> !Verifier 3 netebpfext.sys

Verify Flags Level 0x001209bb

  STANDARD FLAGS:
    [X] (0x00000000) Automatic Checks
    [X] (0x00000001) Special pool
    [X] (0x00000002) Force IRQL checking
    [X] (0x00000008) Pool tracking
    [X] (0x00000010) I/O verification
    [X] (0x00000020) Deadlock detection
    [X] (0x00000080) DMA checking
    [X] (0x00000100) Security checks
    [X] (0x00000800) Miscellaneous checks
    [X] (0x00020000) DDI compliance checking

  ADDITIONAL FLAGS:
    [ ] (0x00000004) Randomized low resources simulation
    [ ] (0x00000200) Force pending I/O requests
    [ ] (0x00000400) IRP logging
    [ ] (0x00002000) Invariant MDL checking for stack
    [ ] (0x00004000) Invariant MDL checking for driver
    [ ] (0x00008000) Power framework delay fuzzing
    [ ] (0x00010000) Port/miniport interface checking
    [ ] (0x00040000) Systematic low resources simulation
    [ ] (0x00080000) DDI compliance checking (additional)
    [ ] (0x00200000) NDIS/WIFI verification
    [ ] (0x00800000) Kernel synchronization delay fuzzing
    [ ] (0x01000000) VM switch verification
    [ ] (0x02000000) Code integrity checks

  RESERVED FLAGS (use of these flags is unsupported):
    [X] (0x00100000) Unused or reserved flag

    [X] Indicates flag is enabled

Summary of All Verifier Statistics

  RaiseIrqls           0x0
  AcquireSpinLocks     0x0
  Synch Executions     0x0
  Trims                0x1cc

  Pool Allocations Attempted             0xddba333
  Pool Allocations Succeeded             0xddba333
  Pool Allocations Succeeded SpecialPool 0xddba333
  Pool Allocations With NO TAG           0x0
  Pool Allocations Failed                0x0

  Current paged pool allocations         0x0 for 00000000 bytes
  Peak paged pool allocations            0x1 for 00000046 bytes
  Current nonpaged pool allocations      0x228 for 000C5E99 bytes
  Peak nonpaged pool allocations         0x24d for 000C8149 bytes

Driver Verification List
------------------------

nt!_VF_TARGET_DRIVER 0xffffe70886af3130: netebpfext.sys (Loaded)

    Pool Allocation Statistics: ( NonPagedPool / PagedPool )

      Current Pool Allocations: ( 0x00000003 / 0x00000000 )
      Current Pool Bytes:       ( 0x00000108 / 0x00000000 )
      Peak Pool Allocations:    ( 0x0000001f / 0x00000001 )
      Peak Pool Bytes:          ( 0x000009cc / 0x00000046 )
      Contiguous Memory Bytes:       0x00000000
      Peak Contiguous Memory Bytes:  0x00000000

    Pool Allocations:

      Address             Length      Tag   Caller Address    
      ------------------  ----------  ----  ------------------
      0xffffe70887ebafa0  0x00000058  Vfwi  0xfffff806556b62fc  netebpfext!_ebpf_ext_attach_init_rundown+0x2c
      0xffffe70887a42fa0  0x00000058  Vfwi  0xfffff806556b62fc  netebpfext!_ebpf_ext_attach_init_rundown+0x2c
      0xffffe70888044fa0  0x00000058  Vfwi  0xfffff806556b62fc  netebpfext!_ebpf_ext_attach_init_rundown+0x2c
To understand,  
0xffffe70887ebafa0  0x00000058  Vfwi  0xfffff806556b62fc  netebpfext!_ebpf_ext_attach_init_rundown+0x2c

Each listing shows the 
0xffffe70887ebafa0  --- address of the current allocation, 
0x00000058             --- the size, 
Vfwi                         --- the pool tag used, 
0xfffff806556b62fc  ---  the address in the driver code where the request for an allocation was made.
netebpfext!_ebpf_ext_attach_init_rundown+0x2c  --- name of the function

The tag **Vfwi** is used whenever a driver being verified by Driver Verifier makes a request to allocate a work item using [IoAllocateWorkItem](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-ioallocateworkitem).

// This can give all the 1000 entries of allocation and deallocation performed on this thread. kd> !verifier 0xffffe70887ebafa0

shpalani commented 1 year ago

Further Analysis, It can be that the asynchronous completion status of callback to complete is pending and blocked in _ebpf_ext_attach_wait_for_rundown(),

_net_ebpf_extension_detach_client_completion() | --- _ebpf_ext_attach_wait_for_rundown(&hook_client->rundown); | --- IoFreeWorkItem(work_item);

kd> ln 0xfffff806556b62fc Browse module Set bu breakpoint

[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff806556b62d0) netebpfext!_ebpf_ext_attach_init_rundown+0x2c | (fffff806556b6360) netebpfext!_ebpf_ext_attach_wait_for_rundown

static NTSTATUS
_ebpf_ext_attach_init_rundown(net_ebpf_extension_hook_client_t* hook_client)
{
    NTSTATUS status = STATUS_SUCCESS;
    net_ebpf_ext_hook_client_rundown_t* rundown = &hook_client->rundown;

    //
    // Allocate work item for client detach processing.
    //
    hook_client->detach_work_item = IoAllocateWorkItem(_net_ebpf_ext_driver_device_object);
...
}

void
_net_ebpf_extension_detach_client_completion(_In_ DEVICE_OBJECT* device_object, _In_opt_ void* context)
{
    net_ebpf_extension_hook_client_t* hook_client = (net_ebpf_extension_hook_client_t*)context;
    PIO_WORKITEM work_item;

    PAGED_CODE();

    NET_EBPF_EXT_LOG_ENTRY();

    UNREFERENCED_PARAMETER(device_object);

    ASSERT(hook_client != NULL);
    _Analysis_assume_(hook_client != NULL);

    work_item = hook_client->detach_work_item;

    // The NMR model is async, but the only Windows run-down protection API available is a blocking API, so the
    // following call will block until all using threads are complete. This should be fixed in the future.
    // Issue: https://github.com/microsoft/ebpf-for-windows/issues/1854

    // Wait for any in progress callbacks to complete.
    _ebpf_ext_attach_wait_for_rundown(&hook_client->rundown);    <<<<<< This is an asynchronous call, 

    IoFreeWorkItem(work_item);

    // Note: This frees the provider binding context (hook_client).
    NmrProviderDetachClientComplete(hook_client->nmr_binding_handle);   <<<  when it asynchronously completes detaching from a client module
...
}
shpalani commented 1 year ago

Current progress, With latest git pull (03/20/2023),

  1. Ran the below test.ps1 script (in a loop 100 times). No crash was observed.
    
    $count = 0
    $logFile = 'C:\Temp\Test.txt'

while ($count -lt 100) { echo $count sc.exe start netebpfext

netsh ebpf add program .\cgroup_sock_addr.sys

sc.exe stop netebpfext

netsh ebpf show programs 2>&1 > $logFile
$line = Get-Content $logFile -tail 1
$words = $line.split(' ')
$programId = $words[0]

echo 'Delete program id'
netsh ebpf delete program $programId

$count = $count + 1

}


2. Next step is to leave the test machine untouched (longevity), and then check.
The problem is reproduced 2 days later.

kd> !analyze -v


DRIVER_VERIFIER_DETECTED_VIOLATION (c4) A device driver attempting to corrupt the system has been caught. This is because the driver was specified in the registry as being suspect (by the administrator) and the kernel has enabled substantial checking of this driver. If the driver attempts to corrupt the system, BugChecks 0xC4, 0xC1 and 0xA will be among the most commonly seen crashes. Arguments: Arg1: 0000000000000062, A driver has forgotten to free its pool allocations prior to unloading. Arg2: ffffd78320c981c8, name of the driver having the issue. Arg3: ffffd7831d8ea6e0, verifier internal structure with driver information. Arg4: 0000000000000001, total # of (paged+nonpaged) allocations that weren't freed. Type !verifier 3 drivername.sys for info on the allocations that were leaked that caused the bugcheck.

Debugging Details:

Unable to load image \??\C:\Windows\system32\drivers\netebpfext.sys, Win32 error 0n2

KEY_VALUES_STRING: 1

Key  : Analysis.CPU.mSec
Value: 1015

Key  : Analysis.Elapsed.mSec
Value: 19243

Key  : Analysis.IO.Other.Mb
Value: 0

Key  : Analysis.IO.Read.Mb
Value: 2

Key  : Analysis.IO.Write.Mb
Value: 0

Key  : Analysis.Init.CPU.mSec
Value: 890

Key  : Analysis.Init.Elapsed.mSec
Value: 354511

Key  : Analysis.Memory.CommitPeak.Mb
Value: 175

Key  : Bugcheck.Code.KiBugCheckData
Value: 0xc4

Key  : Bugcheck.Code.LegacyAPI
Value: 0xc4

Key  : Failure.Bucket
Value: 0xc4_62_VRF_LEAKED_POOL_IMAGE_netebpfext.sys

Key  : Failure.Hash
Value: {05111805-0c5c-c2df-b4a0-fd87e518b717}

Key  : Hypervisor.Enlightenments.Value
Value: 1113060

Key  : Hypervisor.Enlightenments.ValueHex
Value: 10fbe4

Key  : Hypervisor.Flags.AnyHypervisorPresent
Value: 1

Key  : Hypervisor.Flags.ApicEnlightened
Value: 0

Key  : Hypervisor.Flags.AsyncMemoryHint
Value: 0

Key  : Hypervisor.Flags.CpuManager
Value: 0

Key  : Hypervisor.Flags.DeprecateAutoEoi
Value: 1

Key  : Hypervisor.Flags.DynamicCpuDisabled
Value: 1

Key  : Hypervisor.Flags.Epf
Value: 0

Key  : Hypervisor.Flags.ExtendedProcessorMasks
Value: 1

Key  : Hypervisor.Flags.HardwareMbecAvailable
Value: 0

Key  : Hypervisor.Flags.MaxBankNumber
Value: 0

Key  : Hypervisor.Flags.MemoryZeroingControl
Value: 0

Key  : Hypervisor.Flags.NoExtendedRangeFlush
Value: 0

Key  : Hypervisor.Flags.NoNonArchCoreSharing
Value: 1

Key  : Hypervisor.Flags.Phase0InitDone
Value: 1

Key  : Hypervisor.Flags.PowerSchedulerQos
Value: 0

Key  : Hypervisor.Flags.RootScheduler
Value: 0

Key  : Hypervisor.Flags.SynicAvailable
Value: 1

Key  : Hypervisor.Flags.UseQpcBias
Value: 0

Key  : Hypervisor.Flags.Value
Value: 4722876

Key  : Hypervisor.Flags.ValueHex
Value: 4810bc

Key  : Hypervisor.Flags.VpAssistPage
Value: 1

Key  : Hypervisor.Flags.VsmAvailable
Value: 0

Key  : Hypervisor.RootFlags.Value
Value: 0

Key  : Hypervisor.RootFlags.ValueHex
Value: 0

Key  : Stack.Best.Hash
Value: 15f549471b5cdcd2077480fda3aaa3677c958bbd

Key  : WER.OS.Branch
Value: rs5_release_svc_prod1

Key  : WER.OS.Version
Value: 10.0.17763.3650

BUGCHECK_CODE: c4

BUGCHECK_P1: 62

BUGCHECK_P2: ffffd78320c981c8

BUGCHECK_P3: ffffd7831d8ea6e0

BUGCHECK_P4: 1

FILE_IN_CAB: memory.dmp

VIRTUAL_MACHINE: HyperV

IMAGE_NAME: netebpfext.sys

MODULE_NAME: netebpfext

FAULTING_MODULE: fffff8022fda0000 netebpfext

VERIFIER_DRIVER_ENTRY: dt nt!_MI_VERIFIER_DRIVER_ENTRY ffffd7831d8ea6e0 Symbol nt!_MI_VERIFIER_DRIVER_ENTRY not found.

BLACKBOXBSD: 1 (!blackboxbsd)

PROCESS_NAME: services.exe

STACK_TEXT:
ffffcc0e1c6ee228 fffff8022cbe4e63 : 00000000000000c4 0000000000000062 ffffd78320c981c8 ffffd7831d8ea6e0 : nt!KeBugCheckEx [minkernel\ntos\ke\amd64\procstat.asm @ 140] ffffcc0e1c6ee230 fffff8022cbecf7e : ffffd7831d8ea6e0 ffffcc0e1c6ee320 ffffd78320c98090 ffffd78317267590 : nt!VerifierBugCheckIfAppropriate+0xdf [minkernel\ntos\verifier\vfbugcheck.c @ 322] ffffcc0e1c6ee270 fffff8022c50c03e : ffffd7831d8ea6e0 0000000000000285 0000000000000001 fffff8022fda1000 : nt!VfPoolCheckForLeaks+0x46 [minkernel\ntos\verifier\vfpool.c @ 895] ffffcc0e1c6ee2b0 fffff8022cbd641a : 0000000000023000 ffffd78320c98090 fffff8022c6c1360 fffff8022c6c1360 : nt!VfTargetDriversRemove+0xec6ae [minkernel\ntos\verifier\vftarget.c @ 359] ffffcc0e1c6ee330 fffff8022c85f720 : ffffd78320c98090 ffffcc0e1c6ee460 00000000ffffffff 0000000000000001 : nt!VfDriverUnloadImage+0x3e [minkernel\ntos\verifier\vfdriver.c @ 1685] ffffcc0e1c6ee360 fffff8022c9c02c0 : 000000000000000c 00000000ffffffff ffffd02e00000001 ffff967c0117ed00 : nt!MiUnloadSystemImage+0x3ac [minkernel\ntos\mm\sysload.c @ 6790] ffffcc0e1c6ee580 fffff8022c9c0200 : 0000000000000000 0000000000000000 000000000000067d fffff80276697244 : nt!MmUnloadSystemImage+0x20 [minkernel\ntos\mm\sysload.c @ 6211] ffffcc0e1c6ee5b0 fffff8022c892250 : ffffd7831f5da950 0000000000000000 fffff8022c811200 ffffcc0e1c6ee620 : nt!IopDeleteDriver+0x40 [minkernel\ntos\io\iomgr\objsup.c @ 1071] ffffcc0e1c6ee600 fffff8022c2dc7a4 : 0000000000000000 0000000000000000 ffffcc0e1c6ee880 ffffd7831f5da980 : nt!ObpRemoveObjectRoutine+0x80 [minkernel\ntos\ob\obref.c @ 3287] ffffcc0e1c6ee660 fffff8022c9ba047 : ffffd7831f5da980 ffffcc0e1c6ee9e0 ffffcc0e1c6ee9e0 00000000c0000001 : nt!ObfDereferenceObject+0xa4 [minkernel\ntos\ob\obref.c @ 673] ffffcc0e1c6ee6a0 fffff8022c9b842b : ffffd7831d3d1080 000000000000007c 000000945217eb01 ffffd7831f5da980 : nt!IopUnloadDriver+0x28f [minkernel\ntos\io\iomgr\loadunld.c @ 613] ffffcc0e1c6ee7d0 fffff8022c4810f5 : ffffd7831d3d1080 fffff8022c47d631 fffffffffffffffe ffffcc0e1c6eeb80 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315] ffffcc0e1c6ee800 fffff8022c472a20 : fffff8022c9ba157 ffffd7831d3d1080 0000000000000000 0000000000000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320] ffffcc0e1c6ee998 fffff8022c9ba157 : ffffd7831d3d1080 0000000000000000 0000000000000000 0000000000000000 : nt!KiServiceLinkage [minkernel\ntos\ke\amd64\trap.asm @ 3862] ffffcc0e1c6ee9a0 fffff8022c9b842b : ffffd7831d3d1080 0000000000010246 ffffcc0e1c6eeaf8 0000000000000018 : nt!IopUnloadDriver+0x39f [minkernel\ntos\io\iomgr\loadunld.c @ 426] ffffcc0e1c6eead0 fffff8022c4810f5 : ffffd7831d3d1080 0000024e79063aa0 ffffcc0e1c6eeb80 ffffd78300000000 : nt!NtUnloadDriver+0xb [minkernel\ntos\io\iomgr\loadunld.c @ 315] ffffcc0e1c6eeb00 00007ffe530e3fc4 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiSystemServiceCopyEnd+0x25 [minkernel\ntos\ke\amd64\trap.asm @ 3320] 000000945217eb48 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x00007ffe`530e3fc4

DXGANALYZE_ANALYSIS_TAG_PORT_GLOBAL_INFO_STR: Hybrid_FALSE

DXGANALYZE_ANALYSIS_TAG_ADAPTER_INFO_STR: GPU0_VenId0x1414_DevId0x8d_WDDM1.3_NotActive;GPU2_VenId0x0_DevId0x0_WDDM1.3_Active_Post;

STACK_COMMAND: .cxr; .ecxr ; kb

FAILURE_BUCKET_ID: 0xc4_62_VRF_LEAKED_POOL_IMAGE_netebpfext.sys

OS_VERSION: 10.0.17763.3650

BUILDLAB_STR: rs5_release_svc_prod1

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {05111805-0c5c-c2df-b4a0-fd87e518b717}

FAILURE_ID_REPORT_LINK: https://go.microsoft.com/fwlink/?linkid=2150386&FailureSearchText=05111805-0c5c-c2df-b4a0-fd87e518b717

Followup: MachineOwner

kd> !verifier 3 netebpfext.sys

Verify Flags Level 0x001209bb

STANDARD FLAGS: [X] (0x00000000) Automatic Checks [X] (0x00000001) Special pool [X] (0x00000002) Force IRQL checking [X] (0x00000008) Pool tracking [X] (0x00000010) I/O verification [X] (0x00000020) Deadlock detection [X] (0x00000080) DMA checking [X] (0x00000100) Security checks [X] (0x00000800) Miscellaneous checks [X] (0x00020000) DDI compliance checking

ADDITIONAL FLAGS: [ ] (0x00000004) Randomized low resources simulation [ ] (0x00000200) Force pending I/O requests [ ] (0x00000400) IRP logging [ ] (0x00002000) Invariant MDL checking for stack [ ] (0x00004000) Invariant MDL checking for driver [ ] (0x00008000) Power framework delay fuzzing [ ] (0x00010000) Port/miniport interface checking [ ] (0x00040000) Systematic low resources simulation [ ] (0x00080000) DDI compliance checking (additional) [ ] (0x00200000) NDIS/WIFI verification [ ] (0x00800000) Kernel synchronization delay fuzzing [ ] (0x01000000) VM switch verification [ ] (0x02000000) Code integrity checks

RESERVED FLAGS (use of these flags is unsupported): [X] (0x00100000) Unused or reserved flag

[X] Indicates flag is enabled

Summary of All Verifier Statistics

RaiseIrqls 0x0 AcquireSpinLocks 0x0 Synch Executions 0x0 Trims 0x1494

Pool Allocations Attempted 0x202c9794 Pool Allocations Succeeded 0x202c9794 Pool Allocations Succeeded SpecialPool 0x202c9794 Pool Allocations With NO TAG 0x0 Pool Allocations Failed 0x0

Current paged pool allocations 0x0 for 00000000 bytes Peak paged pool allocations 0x1 for 00000046 bytes Current nonpaged pool allocations 0x1c0 for 000956BD bytes Peak nonpaged pool allocations 0x1db for 000971C9 bytes

Driver Verification List

nt!_VF_TARGET_DRIVER 0xffffd78317267590: netebpfext.sys (Loaded and Unloaded)

Pool Allocation Statistics: ( NonPagedPool / PagedPool )

  Current Pool Allocations: ( 0x00000001 / 0x00000000 )
  Current Pool Bytes:       ( 0x00000058 / 0x00000000 )
  Peak Pool Allocations:    ( 0x0000001a / 0x00000001 )
  Peak Pool Bytes:          ( 0x000007f4 / 0x00000046 )
  Contiguous Memory Bytes:       0x00000000
  Peak Contiguous Memory Bytes:  0x00000000

Pool Allocations:

  Address             Length      Tag   Caller Address    
  ------------------  ----------  ----  ------------------
  0xffffd783208d0fa0  0x00000058  Vfwi  0xfffff8022fda62fc  netebpfext!_ebpf_ext_attach_init_rundown+0x2c

kd> ln 0xfffff8022fda62fc Browse module Set bu breakpoint

[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff8022fda62d0) netebpfext!_ebpf_ext_attach_init_rundown+0x2c | (fffff8022fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown

kd> ln 0xfffff8022fda62fc Browse module Set bu breakpoint

[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff8022fda62d0) netebpfext!_ebpf_ext_attach_init_rundown+0x2c | (fffff8022fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown kd>



Memory leak: 0x00000058 = 88 bytes
shpalani commented 1 year ago

The address in the driver code where the request for an allocation was made:

[C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 96] (fffff8022fda62d0) **netebpfext!_ebpf_ext_attach_init_rundown**+0x2c | (fffff8022fda6360) netebpfext!_ebpf_ext_attach_wait_for_rundown

kd> ln fffff802`2fda6360
Browse module
Set bu breakpoint

 [C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 133] (fffff802`2fda6360)   netebpfext!_ebpf_ext_attach_wait_for_rundown   |  (fffff802`2fda6390)   netebpfext!_net_ebpf_extension_hook_provider_attach_client
Exact matches:
    netebpfext!_ebpf_ext_attach_wait_for_rundown (struct _net_ebpf_ext_hook_client_rundown *)

kd> ln fffff802`2fda62d0
Browse module
Set bu breakpoint

 [C:\ebpf\ebpf-for-windows\netebpfext\net_ebpf_ext_hook_provider.c @ 89] (fffff802`2fda62d0)   netebpfext!_ebpf_ext_attach_init_rundown   |  (fffff802`2fda6360)   netebpfext!_ebpf_ext_attach_wait_for_rundown
Exact matches:
    netebpfext!_ebpf_ext_attach_init_rundown (struct _net_ebpf_extension_hook_client *)

Allocator source code path:

static NTSTATUS _ebpf_ext_attach_init_rundown(net_ebpf_extension_hook_client_t hook_client) { NTSTATUS status = STATUS_SUCCESS; net_ebpf_ext_hook_client_rundown_t rundown = &hook_client->rundown;

//
// Allocate work item for client detach processing.
//
hook_client->detach_work_item = **IoAllocateWorkItem(**_net_ebpf_ext_driver_device_object);
if (hook_client->detach_work_item == NULL) {
    status = STATUS_INSUFFICIENT_RESOURCES;
    goto Exit;
}

// Initialize the rundown and disable new references.
ExInitializeRundownProtection(&rundown->protection);
ExWaitForRundownProtectionRelease(&rundown->protection);
rundown->rundown_reinitialized = FALSE;
rundown->rundown_occurred = FALSE;

Exit: return status; }

static void _ebpf_ext_attach_wait_for_rundown(Inout net_ebpf_ext_hook_client_rundown_t* rundown) { ExWaitForRundownProtectionRelease(&rundown->protection); rundown->rundown_occurred = TRUE; }

And as explained earlier, there is only 1 call path which calls the IoFreeWorkItem:

void
_net_ebpf_extension_detach_client_completion(_In_ DEVICE_OBJECT* device_object, _In_opt_ void* context)

Read about this call https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exwaitforrundownprotectionrelease

shpalani commented 1 year ago

Location of latest memory.dmp : \skyshare\scratch\Users\shpalan\ebpf\ebpf-VM-2-03242023

shpalani commented 1 year ago

After leaving the system for 2 days, the driver crash is not reproducible with the fix from https://github.com/microsoft/ebpf-for-windows/pull/2190/files.