openzfsonwindows / openzfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
500 stars 19 forks source link

Possible memory leak "ExAllocatePoolWithTag failed" in `spl-seg_kmem.c, line 134` #283

Closed EchterAgo closed 1 year ago

EchterAgo commented 1 year ago

When testing #281 I noticed that when copying a 5TB dataset using rclone it always ends in an allocation failure:

*** Assertion failed: 0
***   Source File: H:\dev\openzfs\module\os\windows\spl\spl-seg_kmem.c, line 134

so ExAllocatePoolWithTag failed

memory.txt

image

This seems like a new issue because I was still able to copy the full dataset not that long ago.

I'll try to get some kstat information. Is it possible to get kstat info from the debugger when the failure has already happened? I could also try logging it periodically to a file.

EchterAgo commented 1 year ago
 # Child-SP          RetAddr               Call Site
00 ffff8202`30ba9460 fffff803`4a41bca0     nt!KiSwapContext+0x76
01 ffff8202`30ba95a0 fffff803`4a41b1cf     nt!KiSwapThread+0x500
02 ffff8202`30ba9650 fffff803`4a4f7eee     nt!KiCommitThreadWait+0x14f
03 ffff8202`30ba96f0 fffff803`54068eba     nt!KeWaitForMultipleObjects+0x2be
04 ffff8202`30ba9800 fffff803`5410504d     OpenZFS!spl_cv_wait+0xea [H:\dev\openzfs\module\os\windows\spl\spl-condvar.c @ 120] 
05 ffff8202`30ba9890 fffff803`54105745     OpenZFS!rrw_enter_write+0xed [H:\dev\openzfs\module\zfs\rrwlock.c @ 219] 
06 ffff8202`30ba98d0 fffff803`541056ab     OpenZFS!rrm_enter_write+0x35 [H:\dev\openzfs\module\zfs\rrwlock.c @ 371] 
07 ffff8202`30ba9910 fffff803`54380cfd     OpenZFS!rrm_enter+0x3b [H:\dev\openzfs\module\zfs\rrwlock.c @ 348] 
08 ffff8202`30ba9950 fffff803`54380b19     OpenZFS!zfsvfs_teardown+0xcd [H:\dev\openzfs\module\os\windows\zfs\zfs_vfsops.c @ 1458] 
09 ffff8202`30ba99b0 fffff803`543c868f     OpenZFS!zfs_vfs_unmount+0x209 [H:\dev\openzfs\module\os\windows\zfs\zfs_vfsops.c @ 1653] 
0a ffff8202`30ba9b40 fffff803`5437c575     OpenZFS!zfs_windows_unmount+0x41f [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows_mount.c @ 1581] 
0b ffff8202`30baa430 fffff803`540838d6     OpenZFS!zfs_ioc_unmount+0x55 [H:\dev\openzfs\module\os\windows\zfs\zfs_ioctl_os.c @ 916] 
0c ffff8202`30baa470 fffff803`5437c3a5     OpenZFS!zfsdev_ioctl_common+0x816 [H:\dev\openzfs\module\zfs\zfs_ioctl.c @ 7866] 
0d ffff8202`30baa550 fffff803`5435f06d     OpenZFS!zfsdev_ioctl+0x2c5 [H:\dev\openzfs\module\os\windows\zfs\zfs_ioctl_os.c @ 866] 
0e ffff8202`30baa640 fffff803`5435e976     OpenZFS!ioctlDispatcher+0x32d [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 6409] 
0f ffff8202`30baa710 fffff803`4a410665     OpenZFS!dispatcher+0x1e6 [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 7321] 
10 ffff8202`30baa800 fffff803`4a80142c     nt!IofCallDriver+0x55
11 ffff8202`30baa840 fffff803`4a801081     nt!IopSynchronousServiceTail+0x34c
12 ffff8202`30baa8e0 fffff803`4a8003f6     nt!IopXxxControlFile+0xc71
13 ffff8202`30baaa20 fffff803`4a610ef8     nt!NtDeviceIoControlFile+0x56
14 ffff8202`30baaa90 00007ffa`f122d0c4     nt!KiSystemServiceCopyEnd+0x28
15 000000fb`a09fcee8 00000000`00000000     0x00007ffa`f122d0c4
lundman commented 1 year ago

OK, so it grabbed vfs_busy(), then goes and waits for ZFS_TEARDOWN_ENTER_WRITE(zfsvfs, FTAG); yeah that's not going to work

lundman commented 1 year ago

so we need to remove the vfs_busy() calls from unmount rats, will need to rethink this

lundman commented 1 year ago

side note to self: apparently "System Volume Information" directory is tagged HIDDEN - add logic to dirlist to skip HIDDEN attributes.

lundman commented 1 year ago

yeah we should remove the vfs_busy() stuff from the 3 acquirelazy/read/fastio. if there are still issues there, they need other options

lundman commented 1 year ago

side note to self: apparently "System Volume Information" directory is tagged HIDDEN - add logic to dirlist to skip HIDDEN attributes.

Nope, chatgpt says to always return everything, Explorer will skip over HIDDEN, which checks out.

EchterAgo commented 1 year ago

With cab2a207eccf2666e77c981de345f8bcf3b3125c I get a XΛ#øÿÿ: mutex not m_initialised

 # Child-SP          RetAddr               Call Site
00 ffff870e`668845f0 fffff802`23497fec     OpenZFS!panic+0x3c [H:\dev\openzfs\module\os\windows\spl\spl-debug.c @ 32] 
01 ffff870e`66884630 fffff802`23498ec8     OpenZFS!spl_mutex_enter+0x3c [H:\dev\openzfs\module\os\windows\spl\spl-mutex.c @ 119] 
02 ffff870e`668846a0 fffff802`23534e7d     OpenZFS!spl_cv_wait+0xf8 [H:\dev\openzfs\module\os\windows\spl\spl-condvar.c @ 127] 
03 ffff870e`66884730 fffff802`23534d1f     OpenZFS!rrw_enter_read_impl+0x14d [H:\dev\openzfs\module\zfs\rrwlock.c @ 178] 
04 ffff870e`66884780 fffff802`23535709     OpenZFS!rrw_enter_read+0x1f [H:\dev\openzfs\module\zfs\rrwlock.c @ 198] 
05 ffff870e`668847c0 fffff802`2377d2e4     OpenZFS!rrm_enter_read+0x49 [H:\dev\openzfs\module\zfs\rrwlock.c @ 364] 
06 ffff870e`66884800 fffff802`2377d1be     OpenZFS!zfs_enter+0x24 [H:\dev\openzfs\include\os\windows\zfs\sys\zfs_znode_impl.h @ 149] 
07 ffff870e`66884840 fffff802`184579e0     OpenZFS!zfs_AcquireForLazyWrite+0x13e [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 140] 
08 ffff870e`668848c0 fffff802`185057b1     nt!CcWriteBehindInternal+0x130
09 ffff870e`668849a0 fffff802`18502fe1     nt!CcWriteBehind+0x91
0a ffff870e`66884a90 fffff802`18450545     nt!CcCachemapUninitWorkerThread+0xf1
0b ffff870e`66884b70 fffff802`1850e6f5     nt!ExpWorkerThread+0x105
0c ffff870e`66884c10 fffff802`18606278     nt!PspSystemThreadStartup+0x55
0d ffff870e`66884c60 00000000`00000000     nt!KiStartSystemThread+0x28

This is here in zfs_AcquireForLazyWrite:

    if (zfsvfs->z_unmounted ||
        zfs_enter(zfsvfs, FTAG) != 0) {

When I check, zfsvfs->z_unmounted is already TRUE, so it was set after the mutex has been destroyed? I suspect this is also what crashed my GitHub actions runner in tests.py earlier even though it ran completely stable before.

EchterAgo commented 1 year ago

zfs_freevfs and zfsvfs_free were called just before the crash. I think in zfs_vfs_unmount we need to set zfsvfs->z_unmounted = B_TRUE; before the zfs_freevfs at the end.

What is also curious is that in zfs_AcquireForLazyWrite zfsvfs is not NULL, but when checking zmo->fsprivate it is already zeroed, so the zeroing must have happened in between line 131 and 141.

EchterAgo commented 1 year ago

Even if we moved zfsvfs->z_unmounted = B_TRUE in zfs_vfs_unmount we'd still potentially be accessing freed memory.

Another curious thing I found is:

FFFFA162A455D080: dprintf: zfs_vfsops.c:2062:zfs_freevfs(): +freevfs
FFFFA162A455D080: dprintf: zfs_vfsops.c:879:zfsvfs_free(): +zfsvfs_free

yet, there are no corresponding -freevfs / -zfsvfs_free, so it must have happened in the middle of freeing those.

EchterAgo commented 1 year ago

It must have happened when zfsvfs_free was still before the Unloading hardlink AVLtree print

EchterAgo commented 1 year ago

To summarize:

EchterAgo commented 1 year ago

The unmount thread is in zfsvfs_free just after ZFS_TEARDOWN_DESTROY(zfsvfs);

lundman commented 1 year ago

yeah it needs a solution, even a tryenter would work.

lundman commented 1 year ago

Ah ok, I should have read the XNU sources better: https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/bsd/vfs/vfs_subr.c#L973

We should use vfs_busy() as a rwlock instead, and a call to vfs_busy() gets a shared lock only. We should also have LK_NOWAIT flag, and the LazyWrite/ReadAhead/fastio_modwrite should use vfs_busy(..., LK_NOWAIT).

Then in the unmount code, after the getzfsvfs() which gets a sharedlock, we should upgrade to exclusive.

I have double turnover today, so I might not be able to get that code done until tomorrow.

EchterAgo commented 1 year ago

But how will that help us if the zfsvfs is potentially already freed? There is a good chance that whatever we do works but then breaks if the memory usage pattern changes. Do we need a lock in mount?

EchterAgo commented 1 year ago

I created #303 to continue discussion of this and leave this issue for the memory leak issue.

EchterAgo commented 1 year ago

With the latest changes (c8dbc2546cb619097df33ad5ad3a1ff5d18c9577) I still get the hang at unmount, same stack trace as in https://github.com/openzfsonwindows/openzfs/issues/283#issuecomment-1770383934 other than the line number in zfs_vnops_windows_mount.c:

0: kd> dt OpenZFS!vfs_main_lock
   +0x000 rw_lock          : _ERESOURCE
   +0x068 rw_owner         : 0xffffaae4`87474080 Void
   +0x070 rw_readers       : 0n0
   +0x074 rw_pad           : 0n305419896
0: kd> .thread ffffaae487474080
Implicit thread is now ffffaae4`87474080
0: kd> k
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr               Call Site
00 fffff282`db2ad460 fffff802`2441bca0     nt!KiSwapContext+0x76
01 fffff282`db2ad5a0 fffff802`2441b1cf     nt!KiSwapThread+0x500
02 fffff282`db2ad650 fffff802`244f7eee     nt!KiCommitThreadWait+0x14f
03 fffff282`db2ad6f0 fffff802`2ee58eba     nt!KeWaitForMultipleObjects+0x2be
04 fffff282`db2ad800 fffff802`2eef521d     OpenZFS!spl_cv_wait+0xea [H:\dev\openzfs\module\os\windows\spl\spl-condvar.c @ 120] 
05 fffff282`db2ad890 fffff802`2eef5915     OpenZFS!rrw_enter_write+0xed [H:\dev\openzfs\module\zfs\rrwlock.c @ 219] 
06 fffff282`db2ad8d0 fffff802`2eef587b     OpenZFS!rrm_enter_write+0x35 [H:\dev\openzfs\module\zfs\rrwlock.c @ 371] 
07 fffff282`db2ad910 fffff802`2f170ecd     OpenZFS!rrm_enter+0x3b [H:\dev\openzfs\module\zfs\rrwlock.c @ 348] 
08 fffff282`db2ad950 fffff802`2f170ce9     OpenZFS!zfsvfs_teardown+0xcd [H:\dev\openzfs\module\os\windows\zfs\zfs_vfsops.c @ 1458] 
09 fffff282`db2ad9b0 fffff802`2f1b88c1     OpenZFS!zfs_vfs_unmount+0x209 [H:\dev\openzfs\module\os\windows\zfs\zfs_vfsops.c @ 1653] 
0a fffff282`db2adb40 fffff802`2f16c745     OpenZFS!zfs_windows_unmount+0x481 [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows_mount.c @ 1595] 
0b fffff282`db2ae430 fffff802`2ee73aa6     OpenZFS!zfs_ioc_unmount+0x55 [H:\dev\openzfs\module\os\windows\zfs\zfs_ioctl_os.c @ 916] 
0c fffff282`db2ae470 fffff802`2f16c575     OpenZFS!zfsdev_ioctl_common+0x816 [H:\dev\openzfs\module\zfs\zfs_ioctl.c @ 7866] 
0d fffff282`db2ae550 fffff802`2f14f23d     OpenZFS!zfsdev_ioctl+0x2c5 [H:\dev\openzfs\module\os\windows\zfs\zfs_ioctl_os.c @ 866] 
0e fffff282`db2ae640 fffff802`2f14eb46     OpenZFS!ioctlDispatcher+0x32d [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 6409] 
0f fffff282`db2ae710 fffff802`24410665     OpenZFS!dispatcher+0x1e6 [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 7321] 
10 fffff282`db2ae800 fffff802`2480142c     nt!IofCallDriver+0x55
11 fffff282`db2ae840 fffff802`24801081     nt!IopSynchronousServiceTail+0x34c
12 fffff282`db2ae8e0 fffff802`248003f6     nt!IopXxxControlFile+0xc71
13 fffff282`db2aea20 fffff802`24610ef8     nt!NtDeviceIoControlFile+0x56
14 fffff282`db2aea90 00007ffa`01f8d0c4     nt!KiSystemServiceCopyEnd+0x28
15 000000c8`600fc868 00000000`00000000     0x00007ffa`01f8d0c4

I noticed now it happens even after running just rclone for a very short time on a simple mount with no other tests running, so I can easily reproduce this now.

I'll try to get a reproducer for this.

EchterAgo commented 1 year ago

stacks.txt cbuf.txt

Also, if you need any more info from this crash, I haven't restarted the machine yet.

EchterAgo commented 1 year ago

Letting the thing run for a couple of minutes and then dumping cbuf again yields new entries, but I've waited quite a bit now, I don't see disk activity, but I'll let it run more.

cbuf2.txt cbuf3.txt

lundman commented 1 year ago

Hmmm so what is actually happening. Clearly the thread you pasted in waiting for WRITER lock in teardown, which should be well inside WRITER lock of zfs_windows_unmount.

So what thread is holding rrw_enter_write+0xed [H:\dev\openzfs\module\zfs\rrwlock.c @ 219] ?

lundman commented 1 year ago

rrm_enter(&(zfsvfs)->z_teardown_lock, RW_WRITER, tag)

EchterAgo commented 1 year ago

I don't think I understand what you mean. The rrl->rr_lock in rrw_enter_write? :

5: kd> dt rrl
Local var @ 0xfffff282db2ad8c0 Type rrwlock*
0xffffa583`e2d667b8 
   +0x000 rr_lock          : kmutex
   +0x030 rr_cv            : cv
   +0x070 rr_writer        : (null) 
   +0x078 rr_anon_rcount   : refcount
   +0x0f8 rr_linked_rcount : refcount
   +0x178 rr_writer_wanted : 1 ( B_TRUE )
   +0x17c rr_track_all     : 0 ( B_FALSE )
5: kd> dt kmutex poi(rrl)
OpenZFS!kmutex
   +0x000 m_lock           : mutex_t
   +0x018 m_owner          : (null) 
   +0x020 m_destroy_lock   : 0
   +0x028 m_initialised    : 0x23456789
5: kd> dt -b rrl
Local var @ 0xfffff282db2ad8c0 Type rrwlock*
0xffffa583`e2d667b8 
   +0x000 rr_lock          : kmutex
      +0x000 m_lock           : mutex_t
         +0x000 opaque           : _KEVENT
            +0x000 Header           : _DISPATCHER_HEADER
               +0x000 Lock             : 0n393217
               +0x000 LockNV           : 0n393217
               +0x000 Type             : 0x1 ''
               +0x001 Signalling       : 0 ''
               +0x002 Size             : 0x6 ''
               +0x003 Reserved1        : 0 ''
               +0x000 TimerType        : 0x1 ''
               +0x001 TimerControlFlags : 0 ''
               +0x001 Absolute         : 0y0
               +0x001 Wake             : 0y0
               +0x001 EncodedTolerableDelay : 0y000000 (0)
               +0x002 Hand             : 0x6 ''
               +0x003 TimerMiscFlags   : 0 ''
               +0x003 Index            : 0y000000 (0)
               +0x003 Inserted         : 0y0
               +0x003 Expired          : 0y0
               +0x000 Timer2Type       : 0x1 ''
               +0x001 Timer2Flags      : 0 ''
               +0x001 Timer2Inserted   : 0y0
               +0x001 Timer2Expiring   : 0y0
               +0x001 Timer2CancelPending : 0y0
               +0x001 Timer2SetPending : 0y0
               +0x001 Timer2Running    : 0y0
               +0x001 Timer2Disabled   : 0y0
               +0x001 Timer2ReservedFlags : 0y00
               +0x002 Timer2ComponentId : 0x6 ''
               +0x003 Timer2RelativeId : 0 ''
               +0x000 QueueType        : 0x1 ''
               +0x001 QueueControlFlags : 0 ''
               +0x001 Abandoned        : 0y0
               +0x001 DisableIncrement : 0y0
               +0x001 QueueReservedControlFlags : 0y000000 (0)
               +0x002 QueueSize        : 0x6 ''
               +0x003 QueueReserved    : 0 ''
               +0x000 ThreadType       : 0x1 ''
               +0x001 ThreadReserved   : 0 ''
               +0x002 ThreadControlFlags : 0x6 ''
               +0x002 CycleProfiling   : 0y0
               +0x002 CounterProfiling : 0y1
               +0x002 GroupScheduling  : 0y1
               +0x002 AffinitySet      : 0y0
               +0x002 Tagged           : 0y0
               +0x002 EnergyProfiling  : 0y0
               +0x002 SchedulerAssist  : 0y0
               +0x002 ThreadReservedControlFlags : 0y0
               +0x003 DebugActive      : 0 ''
               +0x003 ActiveDR7        : 0y0
               +0x003 Instrumented     : 0y0
               +0x003 Minimal          : 0y0
               +0x003 Reserved4        : 0y00
               +0x003 AltSyscall       : 0y0
               +0x003 Emulation        : 0y0
               +0x003 Reserved5        : 0y0
               +0x000 MutantType       : 0x1 ''
               +0x001 MutantSize       : 0 ''
               +0x002 DpcActive        : 0x6 ''
               +0x003 MutantReserved   : 0 ''
               +0x004 SignalState      : 0n1
               +0x008 WaitListHead     : _LIST_ENTRY [ 0xffffa583`e2d667c0 - 0xffffa583`e2d667c0 ]
                  +0x000 Flink            : 0xffffa583`e2d667c0 
                  +0x008 Blink            : 0xffffa583`e2d667c0 
      +0x018 m_owner          : (null) 
      +0x020 m_destroy_lock   : 0
      +0x028 m_initialised    : 0x23456789
   +0x030 rr_cv            : cv
      +0x000 cv_kevent        : 
       [00] _KEVENT
         +0x000 Header           : _DISPATCHER_HEADER
            +0x000 Lock             : 0n393217
            +0x000 LockNV           : 0n393217
            +0x000 Type             : 0x1 ''
            +0x001 Signalling       : 0 ''
            +0x002 Size             : 0x6 ''
            +0x003 Reserved1        : 0 ''
            +0x000 TimerType        : 0x1 ''
            +0x001 TimerControlFlags : 0 ''
            +0x001 Absolute         : 0y0
            +0x001 Wake             : 0y0
            +0x001 EncodedTolerableDelay : 0y000000 (0)
            +0x002 Hand             : 0x6 ''
            +0x003 TimerMiscFlags   : 0 ''
            +0x003 Index            : 0y000000 (0)
            +0x003 Inserted         : 0y0
            +0x003 Expired          : 0y0
            +0x000 Timer2Type       : 0x1 ''
            +0x001 Timer2Flags      : 0 ''
            +0x001 Timer2Inserted   : 0y0
            +0x001 Timer2Expiring   : 0y0
            +0x001 Timer2CancelPending : 0y0
            +0x001 Timer2SetPending : 0y0
            +0x001 Timer2Running    : 0y0
            +0x001 Timer2Disabled   : 0y0
            +0x001 Timer2ReservedFlags : 0y00
            +0x002 Timer2ComponentId : 0x6 ''
            +0x003 Timer2RelativeId : 0 ''
            +0x000 QueueType        : 0x1 ''
            +0x001 QueueControlFlags : 0 ''
            +0x001 Abandoned        : 0y0
            +0x001 DisableIncrement : 0y0
            +0x001 QueueReservedControlFlags : 0y000000 (0)
            +0x002 QueueSize        : 0x6 ''
            +0x003 QueueReserved    : 0 ''
            +0x000 ThreadType       : 0x1 ''
            +0x001 ThreadReserved   : 0 ''
            +0x002 ThreadControlFlags : 0x6 ''
            +0x002 CycleProfiling   : 0y0
            +0x002 CounterProfiling : 0y1
            +0x002 GroupScheduling  : 0y1
            +0x002 AffinitySet      : 0y0
            +0x002 Tagged           : 0y0
            +0x002 EnergyProfiling  : 0y0
            +0x002 SchedulerAssist  : 0y0
            +0x002 ThreadReservedControlFlags : 0y0
            +0x003 DebugActive      : 0 ''
            +0x003 ActiveDR7        : 0y0
            +0x003 Instrumented     : 0y0
            +0x003 Minimal          : 0y0
            +0x003 Reserved4        : 0y00
            +0x003 AltSyscall       : 0y0
            +0x003 Emulation        : 0y0
            +0x003 Reserved5        : 0y0
            +0x000 MutantType       : 0x1 ''
            +0x001 MutantSize       : 0 ''
            +0x002 DpcActive        : 0x6 ''
            +0x003 MutantReserved   : 0 ''
            +0x004 SignalState      : 0n0
            +0x008 WaitListHead     : _LIST_ENTRY [ 0xffffaae4`874741c0 - 0xffffaae4`874741c0 ]
               +0x000 Flink            : 0xffffaae4`874741c0 
               +0x008 Blink            : 0xffffaae4`874741c0 
       [01] 
         +0x000 Header           : _DISPATCHER_HEADER
            +0x000 Lock             : 0n393216
            +0x000 LockNV           : 0n393216
            +0x000 Type             : 0 ''
            +0x001 Signalling       : 0 ''
            +0x002 Size             : 0x6 ''
            +0x003 Reserved1        : 0 ''
            +0x000 TimerType        : 0 ''
            +0x001 TimerControlFlags : 0 ''
            +0x001 Absolute         : 0y0
            +0x001 Wake             : 0y0
            +0x001 EncodedTolerableDelay : 0y000000 (0)
            +0x002 Hand             : 0x6 ''
            +0x003 TimerMiscFlags   : 0 ''
            +0x003 Index            : 0y000000 (0)
            +0x003 Inserted         : 0y0
            +0x003 Expired          : 0y0
            +0x000 Timer2Type       : 0 ''
            +0x001 Timer2Flags      : 0 ''
            +0x001 Timer2Inserted   : 0y0
            +0x001 Timer2Expiring   : 0y0
            +0x001 Timer2CancelPending : 0y0
            +0x001 Timer2SetPending : 0y0
            +0x001 Timer2Running    : 0y0
            +0x001 Timer2Disabled   : 0y0
            +0x001 Timer2ReservedFlags : 0y00
            +0x002 Timer2ComponentId : 0x6 ''
            +0x003 Timer2RelativeId : 0 ''
            +0x000 QueueType        : 0 ''
            +0x001 QueueControlFlags : 0 ''
            +0x001 Abandoned        : 0y0
            +0x001 DisableIncrement : 0y0
            +0x001 QueueReservedControlFlags : 0y000000 (0)
            +0x002 QueueSize        : 0x6 ''
            +0x003 QueueReserved    : 0 ''
            +0x000 ThreadType       : 0 ''
            +0x001 ThreadReserved   : 0 ''
            +0x002 ThreadControlFlags : 0x6 ''
            +0x002 CycleProfiling   : 0y0
            +0x002 CounterProfiling : 0y1
            +0x002 GroupScheduling  : 0y1
            +0x002 AffinitySet      : 0y0
            +0x002 Tagged           : 0y0
            +0x002 EnergyProfiling  : 0y0
            +0x002 SchedulerAssist  : 0y0
            +0x002 ThreadReservedControlFlags : 0y0
            +0x003 DebugActive      : 0 ''
            +0x003 ActiveDR7        : 0y0
            +0x003 Instrumented     : 0y0
            +0x003 Minimal          : 0y0
            +0x003 Reserved4        : 0y00
            +0x003 AltSyscall       : 0y0
            +0x003 Emulation        : 0y0
            +0x003 Reserved5        : 0y0
            +0x000 MutantType       : 0 ''
            +0x001 MutantSize       : 0 ''
            +0x002 DpcActive        : 0x6 ''
            +0x003 MutantReserved   : 0 ''
            +0x004 SignalState      : 0n0
            +0x008 WaitListHead     : _LIST_ENTRY [ 0xffffaae4`874741f0 - 0xffffaae4`874741f0 ]
               +0x000 Flink            : 0xffffaae4`874741f0 
               +0x008 Blink            : 0xffffaae4`874741f0 
      +0x030 cv_waiters_count_lock : 0
      +0x038 cv_waiters_count : 1
      +0x03c cv_initialised   : 0x12345678
   +0x070 rr_writer        : (null) 
   +0x078 rr_anon_rcount   : refcount
      +0x000 rc_count         : 0x12
      +0x008 rc_mtx           : kmutex
         +0x000 m_lock           : mutex_t
            +0x000 opaque           : _KEVENT
               +0x000 Header           : _DISPATCHER_HEADER
                  +0x000 Lock             : 0n393217
                  +0x000 LockNV           : 0n393217
                  +0x000 Type             : 0x1 ''
                  +0x001 Signalling       : 0 ''
                  +0x002 Size             : 0x6 ''
                  +0x003 Reserved1        : 0 ''
                  +0x000 TimerType        : 0x1 ''
                  +0x001 TimerControlFlags : 0 ''
                  +0x001 Absolute         : 0y0
                  +0x001 Wake             : 0y0
                  +0x001 EncodedTolerableDelay : 0y000000 (0)
                  +0x002 Hand             : 0x6 ''
                  +0x003 TimerMiscFlags   : 0 ''
                  +0x003 Index            : 0y000000 (0)
                  +0x003 Inserted         : 0y0
                  +0x003 Expired          : 0y0
                  +0x000 Timer2Type       : 0x1 ''
                  +0x001 Timer2Flags      : 0 ''
                  +0x001 Timer2Inserted   : 0y0
                  +0x001 Timer2Expiring   : 0y0
                  +0x001 Timer2CancelPending : 0y0
                  +0x001 Timer2SetPending : 0y0
                  +0x001 Timer2Running    : 0y0
                  +0x001 Timer2Disabled   : 0y0
                  +0x001 Timer2ReservedFlags : 0y00
                  +0x002 Timer2ComponentId : 0x6 ''
                  +0x003 Timer2RelativeId : 0 ''
                  +0x000 QueueType        : 0x1 ''
                  +0x001 QueueControlFlags : 0 ''
                  +0x001 Abandoned        : 0y0
                  +0x001 DisableIncrement : 0y0
                  +0x001 QueueReservedControlFlags : 0y000000 (0)
                  +0x002 QueueSize        : 0x6 ''
                  +0x003 QueueReserved    : 0 ''
                  +0x000 ThreadType       : 0x1 ''
                  +0x001 ThreadReserved   : 0 ''
                  +0x002 ThreadControlFlags : 0x6 ''
                  +0x002 CycleProfiling   : 0y0
                  +0x002 CounterProfiling : 0y1
                  +0x002 GroupScheduling  : 0y1
                  +0x002 AffinitySet      : 0y0
                  +0x002 Tagged           : 0y0
                  +0x002 EnergyProfiling  : 0y0
                  +0x002 SchedulerAssist  : 0y0
                  +0x002 ThreadReservedControlFlags : 0y0
                  +0x003 DebugActive      : 0 ''
                  +0x003 ActiveDR7        : 0y0
                  +0x003 Instrumented     : 0y0
                  +0x003 Minimal          : 0y0
                  +0x003 Reserved4        : 0y00
                  +0x003 AltSyscall       : 0y0
                  +0x003 Emulation        : 0y0
                  +0x003 Reserved5        : 0y0
                  +0x000 MutantType       : 0x1 ''
                  +0x001 MutantSize       : 0 ''
                  +0x002 DpcActive        : 0x6 ''
                  +0x003 MutantReserved   : 0 ''
                  +0x004 SignalState      : 0n1
                  +0x008 WaitListHead     : _LIST_ENTRY [ 0xffffa583`e2d66840 - 0xffffa583`e2d66840 ]
                     +0x000 Flink            : 0xffffa583`e2d66840 
                     +0x008 Blink            : 0xffffa583`e2d66840 
         +0x018 m_owner          : (null) 
         +0x020 m_destroy_lock   : 0
         +0x028 m_initialised    : 0x23456789
      +0x038 rc_tree          : avl_tree
         +0x000 avl_root         : (null) 
         +0x008 avl_compar       : 0xfffff802`2efa36a0 
         +0x010 avl_offset       : 0
         +0x014 avl_numnodes     : 0
         +0x018 avl_size         : 0
      +0x058 rc_removed       : list
         +0x000 list_size        : 0x30
         +0x008 list_offset      : 0
         +0x010 list_head        : list_node
            +0x000 list_next        : 0xffffa583`e2d66898 
            +0x008 list_prev        : 0xffffa583`e2d66898 
      +0x078 rc_removed_count : 0
      +0x07c rc_tracked       : 0 ( B_FALSE )
   +0x0f8 rr_linked_rcount : refcount
      +0x000 rc_count         : 0
      +0x008 rc_mtx           : kmutex
         +0x000 m_lock           : mutex_t
            +0x000 opaque           : _KEVENT
               +0x000 Header           : _DISPATCHER_HEADER
                  +0x000 Lock             : 0n393217
                  +0x000 LockNV           : 0n393217
                  +0x000 Type             : 0x1 ''
                  +0x001 Signalling       : 0 ''
                  +0x002 Size             : 0x6 ''
                  +0x003 Reserved1        : 0 ''
                  +0x000 TimerType        : 0x1 ''
                  +0x001 TimerControlFlags : 0 ''
                  +0x001 Absolute         : 0y0
                  +0x001 Wake             : 0y0
                  +0x001 EncodedTolerableDelay : 0y000000 (0)
                  +0x002 Hand             : 0x6 ''
                  +0x003 TimerMiscFlags   : 0 ''
                  +0x003 Index            : 0y000000 (0)
                  +0x003 Inserted         : 0y0
                  +0x003 Expired          : 0y0
                  +0x000 Timer2Type       : 0x1 ''
                  +0x001 Timer2Flags      : 0 ''
                  +0x001 Timer2Inserted   : 0y0
                  +0x001 Timer2Expiring   : 0y0
                  +0x001 Timer2CancelPending : 0y0
                  +0x001 Timer2SetPending : 0y0
                  +0x001 Timer2Running    : 0y0
                  +0x001 Timer2Disabled   : 0y0
                  +0x001 Timer2ReservedFlags : 0y00
                  +0x002 Timer2ComponentId : 0x6 ''
                  +0x003 Timer2RelativeId : 0 ''
                  +0x000 QueueType        : 0x1 ''
                  +0x001 QueueControlFlags : 0 ''
                  +0x001 Abandoned        : 0y0
                  +0x001 DisableIncrement : 0y0
                  +0x001 QueueReservedControlFlags : 0y000000 (0)
                  +0x002 QueueSize        : 0x6 ''
                  +0x003 QueueReserved    : 0 ''
                  +0x000 ThreadType       : 0x1 ''
                  +0x001 ThreadReserved   : 0 ''
                  +0x002 ThreadControlFlags : 0x6 ''
                  +0x002 CycleProfiling   : 0y0
                  +0x002 CounterProfiling : 0y1
                  +0x002 GroupScheduling  : 0y1
                  +0x002 AffinitySet      : 0y0
                  +0x002 Tagged           : 0y0
                  +0x002 EnergyProfiling  : 0y0
                  +0x002 SchedulerAssist  : 0y0
                  +0x002 ThreadReservedControlFlags : 0y0
                  +0x003 DebugActive      : 0 ''
                  +0x003 ActiveDR7        : 0y0
                  +0x003 Instrumented     : 0y0
                  +0x003 Minimal          : 0y0
                  +0x003 Reserved4        : 0y00
                  +0x003 AltSyscall       : 0y0
                  +0x003 Emulation        : 0y0
                  +0x003 Reserved5        : 0y0
                  +0x000 MutantType       : 0x1 ''
                  +0x001 MutantSize       : 0 ''
                  +0x002 DpcActive        : 0x6 ''
                  +0x003 MutantReserved   : 0 ''
                  +0x004 SignalState      : 0n1
                  +0x008 WaitListHead     : _LIST_ENTRY [ 0xffffa583`e2d668c0 - 0xffffa583`e2d668c0 ]
                     +0x000 Flink            : 0xffffa583`e2d668c0 
                     +0x008 Blink            : 0xffffa583`e2d668c0 
         +0x018 m_owner          : (null) 
         +0x020 m_destroy_lock   : 0
         +0x028 m_initialised    : 0x23456789
      +0x038 rc_tree          : avl_tree
         +0x000 avl_root         : (null) 
         +0x008 avl_compar       : 0xfffff802`2efa36a0 
         +0x010 avl_offset       : 0
         +0x014 avl_numnodes     : 0
         +0x018 avl_size         : 0
      +0x058 rc_removed       : list
         +0x000 list_size        : 0x30
         +0x008 list_offset      : 0
         +0x010 list_head        : list_node
            +0x000 list_next        : 0xffffa583`e2d66918 
            +0x008 list_prev        : 0xffffa583`e2d66918 
      +0x078 rc_removed_count : 0
      +0x07c rc_tracked       : 0 ( B_FALSE )
   +0x178 rr_writer_wanted : 1 ( B_TRUE )
   +0x17c rr_track_all     : 0 ( B_FALSE )

When I go through the `rrl parameter passed to rrm_enter and go through all the 17 rrmlock all the rr_writer except two are ffffaae487474080, the others are 0.

lundman commented 1 year ago

go up until you have easy access to zfsvfs and take a look at (zfsvfs)->z_teardown_lock - should have an owner field

lundman commented 1 year ago

ah so sorry its called rrl->rr_writer = curthread;

lundman commented 1 year ago

Ah ok which is probably what you dumped, so we have writer_wanted, but its waiting for the readers to drain.

EchterAgo commented 1 year ago

rrl shows as a struct rrwlock [17] in the debugger and the rr_writer of almost all of them are ffffaae487474080, same as the unmount thread, except two that are 0.

lundman commented 1 year ago

wait appears to be

    while (zfs_refcount_count(&rrl->rr_anon_rcount) > 0 ||
        zfs_refcount_count(&rrl->rr_linked_rcount) > 0 ||
        rrl->rr_writer != NULL) {
        rrl->rr_writer_wanted = B_TRUE;
        cv_wait(&rrl->rr_cv, &rrl->rr_lock);

rr_anon_rcount seems to be 0x12, and rr_linked_rcount is 0. So we are looking at rr_anon_rcount held.

lundman commented 1 year ago

and that is waiting for cv_wait(&rrl->rr_cv, &rrl->rr_lock); to wake it up, and looking at mutex rr_lock

    +0x018 m_owner          : (null) 
      +0x020 m_destroy_lock   : 0
      +0x028 m_initialised    : 0x23456789

it has been destroyed. ah that is less than ideal

lundman commented 1 year ago

Complicates it that it has RRM_NUM_LOCKS (17) locks in an array, and it picks the lock to use with RRM_TD_LOCK() (((uint32_t)(uintptr_t)(curthread)) % RRM_NUM_LOCKS)

So, thread address ffffaae487474080 % 17, is

lundman commented 1 year ago
  1. phew. ok, so we need to check [14] in the lock array
lundman commented 1 year ago

But you are dumping rrl above, so it should already be [14] - so the rr_lock really is destroyed?

EchterAgo commented 1 year ago

RRM_TD_LOCK

I don't think I can follow, I don't see any RRM_TD_LOCK used in the functions in the call stack.

But you are dumping rrl above, so it should already be [14] - so the rr_lock really is destroyed?

What shows it as destroyed, isn't 0x23456789 == MUTEX_INITIALISED?

lundman commented 1 year ago

Ah so it is, phew that makes more sense.

lundman commented 1 year ago

Ok then it seems perhaps we leaks some zfs_enter() with missing zfs_exit() calls. Could check into rc_tracked how that works, and see what reader might be leaked - I have never tried it tho. I'll glance over the zfs_enter() calls and see if any obvious misses exist.

lundman commented 1 year ago

oh you know what this uses thread%17, and we use zfs_enter() and zfs_exit() in AcquireLazy, and ReleaseLazy - they are not always called by same thread. I'll have to change this around a bit

lundman commented 1 year ago
[openzfs] lundman b6d402c - Don't hold zfs-enter() across threads but hopeful, but let's see if it improves
EchterAgo commented 1 year ago
Using NET for debugging
Opened WinSock 2.0
Waiting to reconnect...
Connected to target 192.168.109.130 on port 50151 on local IP 192.168.109.1.
You can get the target MAC address by running .kdtargetmac command.
Connected to Windows 10 19041 x64 target at (Tue Oct 24 08:26:55.733 2023 (UTC + 7:00)), ptr64 TRUE
Kernel Debugger connection established.
Symbol search path is: srv*
Executable search path is: 
Windows 10 Kernel Version 19041 MP (24 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Edition build lab: 19041.1.amd64fre.vb_release.191206-1406
Machine Name:
Kernel base = 0xfffff803`4ea00000 PsLoadedModuleList = 0xfffff803`4f62a360
Debug session time: Tue Oct 24 08:25:26.192 2023 (UTC + 7:00)
System Uptime: 0 days 0:00:45.926
Break instruction exception - code 80000003 (first chance)
fffff803`5c813da8 cc              int     3
7: kd> !analyze -v
Connected to Windows 10 19041 x64 target at (Tue Oct 24 08:27:04.278 2023 (UTC + 7:00)), ptr64 TRUE
Loading Kernel Symbols
.............................................................A timeout occurred.  The timeout can be increased in the Debugging options page
..
.................................

Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.

...............................
...........................................................
Loading User Symbols

Loading unloaded module list
......
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 3530

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 263033

    Key  : Analysis.Init.CPU.mSec
    Value: 342

    Key  : Analysis.Init.Elapsed.mSec
    Value: 19571

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 86

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Timestamp
    Value: 2019-12-06T14:06:00Z

    Key  : WER.OS.Version
    Value: 10.0.19041.1

BUGCHECK_CODE:  0

BUGCHECK_P1: 0

BUGCHECK_P2: 0

BUGCHECK_P3: 0

BUGCHECK_P4: 0

PROCESS_NAME:  System

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE_STR:  80000003

EXCEPTION_PARAMETER1:  0000000000000000

STACK_TEXT:  
ffffb980`68bbf1a0 fffff803`5c814052     : ffff8388`67752312 fffff803`5c813a23 00000000`00000000 00000000`00000000 : OpenZFS!zfs_refcount_remove_many+0xc8 [H:\dev\openzfs\module\zfs\refcount.c @ 176] 
ffffb980`68bbf270 fffff803`5c7653c8     : ffffb980`68bbf460 00000000`00000000 00000000`00000000 fffff803`5c6c8223 : OpenZFS!zfs_refcount_remove+0x22 [H:\dev\openzfs\module\zfs\refcount.c @ 212] 
ffffb980`68bbf2b0 fffff803`5c7659da     : 00000000`00000000 ffff860f`afce0040 ffffb980`68bbf460 00000000`00000002 : OpenZFS!rrw_exit+0x118 [H:\dev\openzfs\module\zfs\rrwlock.c @ 264] 
ffffb980`68bbf2f0 fffff803`5c9ad594     : ffff860f`c4040470 fffff803`5c6dd86b 00000000`00000000 ffff860f`c4040560 : OpenZFS!rrm_exit+0xaa [H:\dev\openzfs\module\zfs\rrwlock.c @ 386] 
ffffb980`68bbf340 fffff803`5c9c4b0d     : fffff803`5df8fb50 00000000`00000000 00000000`00000000 fffff803`4ec64dd4 : OpenZFS!zfs_exit+0x24 [H:\dev\openzfs\include\os\windows\zfs\sys\zfs_znode_impl.h @ 161] 
ffffb980`68bbf380 fffff803`4ecb2798     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : OpenZFS!fastio_release_for_mod_write+0x14d [H:\dev\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 8041] 
ffffb980`68bbf410 fffff803`4ecb2eac     : ffff90b3`ada4e270 00000000`00000000 00000000`00000000 00000000`00000000 : nt!FsRtlReleaseFileForModWrite+0x160
ffffb980`68bbf6f0 fffff803`4edb790b     : fffff803`4f650d40 00000000`00000001 ffff860f`f79f1bc0 00000000`00000000 : nt!MiGatherMappedPages+0x2e8
ffffb980`68bbf7b0 fffff803`4ed0e6f5     : ffff860f`afce0040 ffff860f`afce0040 00000000`00000080 000f8067`b8bbbdff : nt!MiMappedPageWriter+0x18b
ffffb980`68bbfc10 fffff803`4ee06278     : ffff9780`dd000180 ffff860f`afce0040 fffff803`4ed0e6a0 00000000`00000000 : nt!PspSystemThreadStartup+0x55
ffffb980`68bbfc60 00000000`00000000     : ffffb980`68bc0000 ffffb980`68bba000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28

FAULTING_SOURCE_LINE:  H:\dev\openzfs\module\zfs\refcount.c

FAULTING_SOURCE_FILE:  H:\dev\openzfs\module\zfs\refcount.c

FAULTING_SOURCE_LINE_NUMBER:  176

FAULTING_SOURCE_CODE:  
   172:     int64_t count;
   173: 
   174:     if (likely(!rc->rc_tracked)) {
   175:         count = atomic_add_64_nv(&(rc)->rc_count, -number);
>  176:         ASSERT3S(count, >=, 0);
   177:         return (count);
   178:     }
   179: 
   180:     s.ref_holder = holder;
   181:     s.ref_number = number;

SYMBOL_NAME:  OpenZFS!zfs_refcount_remove_many+c8

MODULE_NAME: OpenZFS

IMAGE_NAME:  OpenZFS.sys

STACK_COMMAND:  .cxr; .ecxr ; kb

BUCKET_ID_FUNC_OFFSET:  c8

FAILURE_BUCKET_ID:  0x0_OpenZFS!zfs_refcount_remove_many

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {438d92bf-a7cc-a0ee-61eb-0ce61a562b51}

Followup:     MachineOwner
---------

just saw this with b6d402c183ab042a821f9910964fe862d9f66c37

lundman commented 1 year ago

aha, confirms we have a leak, or double free. ok I will need to check through them all, starting with fastio_release_for_mod_write

lundman commented 1 year ago

Ah c2e9403 I was trying to debug while in the morning Zoom. Clearly did not go well

EchterAgo commented 1 year ago

Looks like unmount works reliably now. I did an unload to check for leaks and there is some memory leaked but not much. I'll let the test run longer and see.

cbuf.txt

EchterAgo commented 1 year ago

I noticed that if I create a fresh pool at the moment the issue happens at around 2.2TB transferred, memory usage just sharply rises.

Putting a conditional breakpoint on > 13GB allocated in osif_malloc somehow is not triggering despite other conditional breakpoints working :\ Setting a breakpoint in osif_malloc on allocation failure might already be too late, the copy somehow stalled before that, though I know it often does happen.

Now I have the copy stalled, memory almost full and cbuf contains only vmem_freelist_insert_sort_by_time lines and the -EB- marker. I'm dumping stacks, after that I'll see if I can get kstat output and the driver to unload.

EchterAgo commented 1 year ago

CPU seems idle thoough:

image

lundman commented 1 year ago

Yeah kmem just stalls, and I am unsure why - its not even taking CPU. My current plan is to take the latest kmem from macOS and move them over. We did fix a bunch of issues over there and it would be nice to be up to date.

EchterAgo commented 1 year ago

I think this also starts happening just when the system starts swapping. Maybe this would be easier to reproduce with lower memory?

lundman commented 1 year ago

its pretty instant on my VM with your example, on a 2G pool at that, not full. So something is quirky, but the old kmem had a lot of "slow down allocs from XNU" throttling, which we have now removed - so the throttling logic might be bugging out, and it thinks it needs to throttle forever (no signal from XNU in Windows).

EchterAgo commented 1 year ago

This did also start somewhere around fd8bf0d2b92d18b818505413bb7dd8e75fc8decd. Does it make sense figuring out in which commit exactly?

lundman commented 1 year ago

if you have the time, it might mean a quicker fix if its a small problem.

lundman commented 1 year ago

OK re-did the kmem/vmem files, it isn't so much changed, just the local changes to detect pressure. Gave me a chance to clean it up a bit, and now future commits from macOS kmem/vmem should apply more easily.

Worth noting it does behave in the same manner, i have confirmed reaper runs, and it reacts to pressure ok. but the last i noticed is that arc_shrink was not doing the full thing. spl_free_pages has garbage values in it

EchterAgo commented 1 year ago

@lundman I'd be happy to test it, but I can't find the change. Did you forget to push it?

lundman commented 1 year ago

I will in a few, just checking if spl_free being 0xffffffe932843 is something obvious