openzfsonwindows / ZFSin

OpenZFS on Windows port
https://openzfsonwindows.org
1.2k stars 68 forks source link

Disk Manager hangs while creating partition out of a zvol disk #254

Open imtiazdc opened 4 years ago

imtiazdc commented 4 years ago

threads.txt

The issue occurs when the zvol is created with sync=always. The workaround is to:

  1. Create the zvol without sync=always
  2. Create the partition
  3. Set sync=always on zvol

Investigating the stack traces (linked at the top) during the hang reveals that there is a thread waiting for zil_commit to finish:

4.0000e4 ffffc28cd4ed4040 0001985 Blocked nt!KiSwapContext+0x76 nt!KiSwapThread+0x17d nt!KiCommitThreadWait+0x14f nt!KeWaitForMultipleObjects+0x1fe nt!ViKeWaitForMultipleObjectsCommon+0xbc nt!VerifierKeWaitForMultipleObjects+0x51 ZFSin!spl_cv_wait+0xf3 ZFSin!zil_commit_waiter+0x42e ZFSin!zil_commit_impl+0x65 ZFSin!zil_commit+0x235 ZFSin!zvol_write+0x2e3 ZFSin!wzvol_WkRtn+0x419 ZFSin!wzvol_GeneralWkRtn+0x6d nt!IopProcessWorkItem+0x12a nt!ExpWorkerThread+0xe9 nt!PspSystemThreadStartup+0x41 nt!KiStartSystemThread+0x16

Any insights into what might be going on and how to fix?

imtiazdc commented 4 years ago

@lundman Any clue what could be happening here when sync=always?

lundman commented 4 years ago

Amusingly zil_commit_waiter has been all I debugged this week, spent forever trying to work out why it stops on the new port, turns out to be fixed with:

https://github.com/openzfsonosx/openzfs/commit/adeb4523bef7aaf8258d57a48b5898f6d1187864

But that is specific to the newer zil.c and I wouldn't have thought it would be the same problem here.

lundman commented 4 years ago

Hmm unless "long" would just extend the unsigned value - could be worth trying to compare directly with -1.

imtiazdc commented 4 years ago

@lundman I am hearing the hang doesn't happen if we clean the physical disk (remove the 8M partition that zpool creates) before creating zvols. We don't know what the 8M partition is meant for and the side effects of cleaning the disk. But, does that offer any clue on why the zil thread is waiting for ever?

lundman commented 4 years ago

8M is just how they were partitioned on Solaris - don't think it has any specific meaning anymore

imtiazdc commented 4 years ago

@lundman

@vrajendra-datacore just found that whenever this hang (during zvol creation with sync=always) is noticed, the free space of the underlying physical disk is shown as 0 (diskpart > list disk). And the free space is dropping to 0 after you create a zpool and do a rescan in disk management.

Does that information make it easier to root-cause?