Open SRJames opened 1 day ago
Looking into this @SRJames, mind sharing the memory dump through this email address: WindowsContainerGitHubIssues@service.microsoft.com?
Will try. It is 2Gb
Email sent.
Some additional info. In the System Event log that reported the reboot there was this Error entry.
The computer has rebooted from a bugcheck. The bugcheck was: 0x00000076 (0x0000000000000000, 0xffff900b45fd73c0, 0x0000000000000003, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: eb5eb746-1c40-4df9-b275-94ff106ea2a3.
Caused by a driver not cleaning up correctly after an I/O.
Do you know which driver this is? Inbox driver or installed?
Describe the bug We have intermittent issues with Windows servers restarting. A memory dump created at the time of restart, analyzed with windbg, had the contents below. There are two mentions of Logmonitor.exe
***** Preparing the environment for Debugger Extensions Gallery repositories ** ExtensionRepository : Implicit UseExperimentalFeatureForNugetShare : true AllowNugetExeUpdate : true NonInteractiveNuget : true AllowNugetMSCredentialProviderInstall : true AllowParallelInitializationOfLocalRepositories : true
EnableRedirectToV8JsProvider : false
-- Configuring repositories ----> Repository : LocalInstalled, Enabled: true ----> Repository : UserExtensions, Enabled: true
***** Waiting for Debugger Extensions Gallery to Initialize **
Microsoft (R) Windows Debugger Version 10.0.26100.1742 X86 Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP] Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available.
Primary dump contents written successfully
***** Path validation summary ** Response Time (ms) Location Deferred srvhttps://msdl.microsoft.com/download/symbols Symbol search path is: srvhttps://msdl.microsoft.com/download/symbols Executable search path is: Windows 10 Kernel Version 20348 MP (8 procs) Free x64 Product: Server, suite: TerminalServer DataCenter SingleUserTS Edition build lab: 20348.859.amd64fre.fe_release_svc_prod2.220707-1832 Kernel base = 0xfffff803
42000000 PsLoadedModuleList = 0xfffff803
42c33a10 Debug session time: Mon Sep 30 12:12:52.474 2024 (UTC + 0:00) System Uptime: 0 days 5:02:02.125 Loading Kernel Symbols ............................................................... ................................................................ .........................Page 106689 not present in the dump file. Type ".hh dbgerr004" for details ....................................... ......................................... Loading User SymbolsLoading unloaded module list ....................... For analysis of this file, run !analyze -v 5: kd> !analyze -v
PROCESS_HAS_LOCKED_PAGES (76) Caused by a driver not cleaning up correctly after an I/O. Arguments: Arg1: 0000000000000000, Locked memory pages found in process being terminated. Arg2: ffff900b45fd73c0, Process address. Arg3: 0000000000000003, Number of locked pages. Arg4: 0000000000000000, Pointer to driver stacks (if enabled) or 0 if not. Issue a !search over all of physical memory for the current process pointer. This will yield at least one MDL which points to it. Then do another !search for each MDL found, this will yield the IRP(s) that point to it, revealing which driver is leaking the pages. Otherwise, set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackLockedPages to a DWORD 1 value and reboot. Then the system will save stack traces so the guilty driver can be easily identified. When you enable this flag, if the driver commits the error again you will see a different BugCheck - DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS (0xCB) - which can identify the offending driver(s).
Debugging Details:
KEY_VALUES_STRING: 1
BUGCHECK_P1: 0
BUGCHECK_P2: ffff900b45fd73c0
BUGCHECK_P3: 3
BUGCHECK_P4: 0
FILE_IN_CAB: MEMORY.DMP
DUMP_FILE_ATTRIBUTES: 0x1000
PROCESS_NAME: LogMonitor.exe
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)
BLACKBOXWINLOGON: 1
STACK_TEXT:
ffffc283
613b20c8 fffff803
428be6b5 : 0000000000000076 00000000
00000000 ffff900b45fd73c0 00000000
00000003 : nt!KeBugCheckEx ffffc283613b20d0 fffff803
42798f01 : ffff900b45fd7808 ffffc283
613b2190 ffff900b2945e040 ffff900b
45fd73c0 : nt!MmDeleteProcessAddressSpace+0x126845 ffffc283613b2120 fffff803
427c9b50 : ffff900b45fd7390 ffff900b
45fd7390 0000000000000000 00000000
00000000 : nt!PspProcessDelete+0x171 ffffc283613b21c0 fffff803
42376c67 : 0000000000000000 00000000
00000000 ffff900b45fd77f8 ffff900b
45fd73c0 : nt!ObpRemoveObjectRoutine+0x80 ffffc283613b2220 fffff803
427a6e3c : 0000000000000000 ffff900b
526c55f8 0000000000000000 ffff900b
526c55f8 : nt!ObfDereferenceObjectWithTag+0xc7 ffffc283613b2260 fffff803
427c9b50 : ffff900b526c5090 ffff900b
526c5090 fffff80342c263c0 00000000
00000000 : nt!PspThreadDelete+0x33c ffffc283613b22d0 fffff803
42376c67 : 0000000000000000 00000000
00000000 fffff80342c263c0 ffff900b
526c50c0 : nt!ObpRemoveObjectRoutine+0x80 ffffc283613b2330 fffff803
422ad482 : 0000000000000000 00000000
00000000 0000000000000000 fffff803
42c2f0e0 : nt!ObfDereferenceObjectWithTag+0xc7 ffffc283613b2370 fffff803
422f5151 : ffff900b2945e040 fffff803
42d3d6c0 fffff80300000000 fffff803
00000000 : nt!PspReaper+0x72 ffffc283613b23a0 fffff803
422757d5 : ffff900b2945e040 00000000
00000000 ffff900b2945e040 00000000
00000080 : nt!ExpWorkerThread+0x161 ffffc283613b25b0 fffff803
42425458 : ffffb100b9929180 ffff900b
2945e040 fffff80342275780 00000000
00000000 : nt!PspSystemThreadStartup+0x55 ffffc283613b2600 00000000
00000000 : ffffc283613b3000 ffffc283
613ac000 0000000000000000 00000000
00000000 : nt!KiStartSystemThread+0x28SYMBOL_NAME: nt!MmDeleteProcessAddressSpace+126845
MODULE_NAME: nt
STACK_COMMAND: .cxr; .ecxr ; kb
IMAGE_NAME: ntkrnlmp.exe
BUCKET_ID_FUNC_OFFSET: 126845
FAILURE_BUCKET_ID: 0x76_LogMonitor.exe_nt!MmDeleteProcessAddressSpace
OS_VERSION: 10.0.20348.859
BUILDLAB_STR: fe_release_svc_prod2
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {4dc10a00-52b6-9324-e14f-5fd73286e667}
Configuration -Tool: LogMontior -Version: 2.0.2
Additional context Logmonitor is used as our entrypoint to some of our containers. These are running in AWS EKS on this AMI: Windows_Server-2022-English-Full-EKS_Optimized-1.30-2024.09.10 When this issue occurs, the EC2 restarts but becomes unavailable in the cluster