openzfsonwindows / ZFSin

OpenZFS on Windows port
https://openzfsonwindows.org
1.2k stars 68 forks source link

Zpool import takes ~10 mins. #351

Closed datacore-rm closed 1 year ago

datacore-rm commented 1 year ago

The system is with 1TB of RAM and 80 CPUs. As per logs, after system restart, "Zpool.exe import " command takes ~10 mins to complete. The zpool is created with 2 disks of 10GB each and 3 zvols of 1 TB each. We are in the process of collecting more verbose logs and shall share more details.

From the current logs, there are no zfs logs between 09:26:20:925(at which "zpool.exe import" command was executed) and 09:37:35:858(at which the zfs actually starts the import ). Once spa_tryimport() is called then it is fine and whole import process completed less than 1 min.

`NAME USED AVAIL REFER MOUNTPOINT Z.ec4efcc0-7f02-4746-a6a0-f2d157faa43f 1.57M 18.4G 96K none Z.ec4efcc0-7f02-4746-a6a0-f2d157faa43f/3d5a5e1a-e1c5-4fbd-b5dd-325108dc4cdd 368K 18.4G 368K - Z.ec4efcc0-7f02-4746-a6a0-f2d157faa43f/548105c1-7829-418f-8358-b04f39c67717 108K 18.4G 108K - Z.ec4efcc0-7f02-4746-a6a0-f2d157faa43f/7228968a-7c29-4af4-83d0-957855166cb1 108K 18.4G 108K -

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT Z.ec4efcc0-7f02-4746-a6a0-f2d157faa43f 19G 1.94M 19.0G - - 0% 0% 1.00x ONLINE `

00000336 ZFSin 4512 3216 2 0 02\01\2023-09:26:20:473 zfsdev_open, dev 0x7d380 flag 00 devtype 0, proc is FFFFCB85EE73C080: thread FFFFCB85ED121080 00000337 ZFSin 4512 3216 2 0 02\01\2023-09:26:20:473 zfsdev_open, dev 0x7d510 flag 00 devtype 0, proc is FFFFCB85EE73C080: thread FFFFCB85ED121080 00000338 ZFSin 4512 3216 2 0 02\01\2023-09:26:20:473 zfs_check_input_nvpairs:7804: innvl = 0xFFFFC382C71D7F80, vec = 0xFFFFF80659B32120 00000339 ZFSin 4512 3216 2 0 02\01\2023-09:26:20:473 zfsdev_release, dev 0x7d380 flag 00 devtype 0, dev is FFFFCB85EE73C080, thread FFFFCB85ED121080 00000340 ZFSin 4512 3216 2 0 02\01\2023-09:26:20:473 zfsdev_release, dev 0x7d510 flag 00 devtype 0, dev is FFFFCB85EE73C080, thread FFFFCB85ED121080 00000341 ZFSin 12948 5816 42 0 02\01\2023-09:26:20:895 zfsdev_open, dev 0xd0b00 flag 00 devtype 0, proc is FFFFCB85EE733080: thread FFFFCB85ED121080 00000342 ZFSin 12948 5816 42 0 02\01\2023-09:26:20:895 zfsdev_open, dev 0xd0e20 flag 00 devtype 0, proc is FFFFCB85EE733080: thread FFFFCB85ED121080 00000343 ZFSin 12948 5816 42 0 02\01\2023-09:26:20:896 zfs_check_input_nvpairs:7804: innvl = 0xFFFFC382C71D7F20, vec = 0xFFFFF80659B32120 00000344 ZFSin 12948 5816 42 0 02\01\2023-09:26:20:896 zfsdev_release, dev 0xd0b00 flag 00 devtype 0, dev is FFFFCB85EE733080, thread FFFFCB85ED121080 00000345 ZFSin 12948 5816 42 0 02\01\2023-09:26:20:896 zfsdev_release, dev 0xd0e20 flag 00 devtype 0, dev is FFFFCB85EE733080, thread FFFFCB85ED121080 00000346 ZFSin 12400 12428 42 0 02\01\2023-09:26:20:925 zfsdev_open, dev 0xd1140 flag 00 devtype 0, proc is FFFFC38363BA1300: thread FFFFC383618240C0 00000347 ZFSin 12400 12428 42 0 02\01\2023-09:26:20:925 zfsdev_open, dev 0xcf840 flag 00 devtype 0, proc is FFFFC38363BA1300: thread FFFFC383618240C0 00000348 ZFSin 12400 12428 42 0 02\01\2023-09:37:35:858 spa_open_common:4534: Returning 2 00000349 ZFSin 12400 12428 42 0 02\01\2023-09:37:35:858 ioctl out result 2 00000350 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:860 zfs_dbgmsg: spa_tryimport: importing Z.ec4efcc0-7f02-4746-a6a0-f2d157faa43f 00000351 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:860 zfs_dbgmsg: spa_load($import, config trusted): LOADING 00000352 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:860 spa_by_guid:1418: pool_guid = 81159284690532085, device_guid = 0 00000353 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:860 spa_by_guid:1445: Returning 0x0000000000000000 00000354 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:861 spa_by_guid:1418: pool_guid = 3433538115011784880, device_guid = 0 00000355 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:861 spa_by_guid:1445: Returning 0x0000000000000000 00000356 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:861 vdev_alloc:723:Getting the vdev ashift value from nvlist and set in vdev_ashift= 0, vdev_id=0, vdev_guid=81159284690532085, vdev_path=, vdev_physpath= 00000357 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:861 vdev_alloc:723:Getting the vdev ashift value from nvlist and set in vdev_ashift= 12, vdev_id=0, vdev_guid=2244671912405992274, vdev_path=/dev/physicaldrive0, vdev_physpath=#1048576#10726932480#\?\scsi#disk&ven_broadcom&prod_mr9560-16i#5&17a3d8fb&0&01ea00#{53f56307-b6bf-11d0-94f2-00a0c91efb8b} 00000358 ZFSin 12400 12428 74 0 02\01\2023-09:37:35:861 spa_config_parse:1456: Returning 0 00000359 ZFSin 12400 12428 56 0 02\01\2023-09:37:35:862 kernel_ioctl: trying to send kernel ioctl 7c0f4

lundman commented 1 year ago

One thing worth looking into is the import processing has a parallel path, and a sequential path. The parallel one uses aio which I could not get to work on Windows. It could be worth re-investigating, as Windows then only uses sequential.

See zpool_read_label() vs zpool_read_label_slow() in lib/libzutil/zutil_import.c.

lundman commented 1 year ago

It is quite the mystery, right now ZFSin just uses the slow method.

But the aio should work, Windows claims to support it, and yet it didn't even pretend to work correctly when I tried it. One solution would be to figure out why Windows aio doesn't work using upstream calls, if that fails, could implement a similar threaded processing to aio (zfs has threads built it after all) - which could be put in place using aio API function names for smoother future support.

lundman commented 1 year ago

Have you guys started looking at this, or is it something you want me to look at? I'm currently idle

datacore-rm commented 1 year ago

Thanks for the comment. Currently, we are not having much context/info regarding the aio. But as you mentioned, this issue seems to be related to reading zpool label using taskq.

We added some debug logs in function zpool_find_import_win() and zpool_open_func_win(). TraceWrite("Going to wait for taskq thread to complete all zpool_open_func_win [%s:%d]", func, LINE); taskq_wait(t); taskq_destroy(t); TraceWrite("All taskq zpool_open_func_win operation completed [%s:%d]", func, LINE);

Feb-10-2023 08:02:01 Going to wait for taskq thread to complete all zpool_open_func_win [zpool_find_import_win:2275] ...(read the zpool label)...

Feb-10-2023 08:02:01 num_labels 4 ThreadID:11380 [zpool_open_func_win:1584] Feb-10-2023 08:02:01 num_labels 4 ThreadID:5848 [zpool_open_func_win:1584] Feb-10-2023 08:13:14 All taskq zpool_open_func_win operation completed [zpool_find_import_win:2278] Feb-10-2023 08:13:14 get_configs function started [get_configs:517]

zpool_find_import_win() creates a taskq with 128 threads. It dispatched 16 tasks into the taskq. From the logs, it seems like all the 16 tasks were completed within the same second but taskq_wait() took ~11 mins for taskq theads to exit. During this whole 11 mins interval the system freezed and was unusable with high cpu usage. In another run, this whole read label completed in 25 secs. So could be some timing issue. The system has 80 processors.

Processor Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz, 3400 Mhz, 20 Core(s), 40 Logical Processor(s) Processor Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz, 3400 Mhz, 20 Core(s), 40 Logical Processor(s)

We are planning next to capture Xperf trace to get CPU utilization stack trace . Could the cv_wait(&tq->tq_dispatch_cv, &tq->tq_lock) in taskq_thread() return without any wait, making the while() loop spin continuously with all taskq threads active?

lundman commented 1 year ago

Lemme think for a second, so taskq in userland would end up in lib/zpool/taskq.c and uses primitives from pthread.h. I don't know how well that runs, in that the primary goal for userland and phtread.h was to compile and link. Only zdb uses it so far, so don't just assume it will be correct. You may have found an issue.

datacore-rm commented 1 year ago

Thanks. We added more debug logs to zpool.exe and captured the logs. While reading label in func zpool_open_func_win() , win32 CreateFile() for the C:\ volume failed(error 32 = ERROR_SHARING_VIOLATION). CreateFile() took ~7 mins to return. Volumes also to be searched for labels?

_Line 83044: Feb-20-2023 07:39:41 calling createfile().FullPath :\?\Volume{9d64398f-1896-4f2d-8b59-0f99a83063f3} ThreadID:9832 [zpool_open_func_win:1546] Line 83260: Feb-20-2023 07:46:47 func return.CreateFile() failed with error 32.FullPath:\?\Volume{9d64398f-1896-4f2d-8b59-0f99a83063f3} ThreadID:9832 [zpool_open_funcwin:1556]

PS: GWMI -namespace root\cimv2 -class win32_volume GENUS : 2 CLASS : Win32_Volume SUPERCLASS : CIM_StorageVolume __DYNASTY : CIM_ManagedSystemElement RELPATH : Win32_Volume.DeviceID="\\?\Volume{9d64398f-1896-4f2d-8b59-0f99a83063f3}\" __PROPERTY_COUNT : 44 DERIVATION : {CIM_StorageVolume, CIM_StorageExtent, CIM_LogicalDevice, CIM_LogicalElement...} SERVER : ATL1FEC1DC1 NAMESPACE : root\cimv2 PATH : \ATL1FEC1DC1\root\cimv2:Win32_Volume.DeviceID="\\?\Volume{9d64398f-1896-4f2d-8b59-0f 99a83063f3}\" Access : Automount : True Availability : BlockSize : 4096 BootVolume : True Capacity : 2748056596480 Caption : C:\ Compressed : False ConfigManagerErrorCode : ConfigManagerUserConfig : CreationClassName : Description : DeviceID : \?\Volume{9d64398f-1896-4f2d-8b59-0f99a83063f3}\ DirtyBitSet : False DriveLetter : C: DriveType : 3 ErrorCleared : ErrorDescription : ErrorMethodology : FileSystem : NTFS FreeSpace : 1586526830592 IndexingEnabled : True InstallDate : Label : LastErrorCode : MaximumFileNameLength : 255 Name : C:\ NumberOfBlocks : PageFilePresent : False PNPDeviceID : PowerManagementCapabilities : PowerManagementSupported : Purpose : QuotasEnabled : False QuotasIncomplete : False QuotasRebuilding : False SerialNumber : 1122971432 Status : StatusInfo : SupportsDiskQuotas : True SupportsFileBasedCompression : True SystemCreationClassName :

lundman commented 1 year ago

That is surprising, wonder why that is. But ultimately, perhaps we don't need to iterate volumes? It is how the code started, I thought I would be able to iterate all the disks that way, but it doesn't work - it is only for Volumes, which means they have a drive letter and therefor, already have a mounted filesystem (I think that is the case). So then I dove into the SetupDiEnumDeviceInterfaces() family of functions.

What do you guys think? Is there ever a possibility FindNextVolume could find ZFS? The answer might be to simply remove that section

datacore-rm commented 1 year ago

The system has RAID controller and it is not simple. We cannot think of any scenario where FindNextVolume() could find ZFS and think we can remove this section. We did following test:- 1) Formatted a disk with NTFS/raw/fat32 volume. 2) FindNextVolume() lists the volume. 3) Create a zpool with this physical disk. 4) zpool export/machine restart 5) FindNextVolume() does not list this volume. 6) Removed the FindNextVolume() section from the code. Zpool import success with SetupDiEnumDeviceInterfaces().

We plan to get this tested and let you know. Thanks.

lundman commented 1 year ago

Yeah that matches my expectations, thanks for figuring this out!

datacore-rm commented 1 year ago

The suggested change resolved this issue. Thanks for your inputs. This fix can be done in Openzfs repo also.

lundman commented 1 year ago

Thank you, I will do so as well