Closed datacore-rm closed 3 months ago
zvol_assign_targetid() allocates memory for the IO_REMOVE_LOCK structure from kmem cache. Once it is freed, it can allocate the same cached memory in the next allocation call for a new zvol. Seems like this is making the driver verifier to flag that the same IO_REMOVE_LOCK structure is reinitialized second time.
The docs say
After the IoReleaseRemoveLockAndWait routine returns, the driver should consider the device to be in a
state in which it is ready to be removed and cannot perform I/O operations. Therefore, the driver must
not call IoInitializeRemoveLock to re-initialize the remove lock. Violation of this rule while the driver
is being verified by [Driver Verifier]
(https://learn.microsoft.com/en-us/windows-hardware/drivers/what-s-new-in-driver-development)
will result in a bug check.
So the PIO_REMOVE_LOCK
should be used with the driver, and driver unloading, not with a device unloading? Doesn't sound like you can re-allocate remove locks?
This issue can be repro by creating and destroying ZVOL multiple times in a loop. Added debug logs to print the allocation and freed pointers. When kmem cache re-allocates the memory, we get this crash.
for /l %%x in (1, 1, 500) do (
echo %%x
zfs.exe create -s -V 1TB tank/testzvol
zfs destroy tank/testzvol
)
It is documented in the code file zfs_windows_zvol_scsi.c programming notes: the remove lock must be dynamically allocated because it cannot be reinitialized.
So I will test by replacing it with ExAllocatePoolWithTag().
Could just memset the memory of the removelock. kmem_cache will not zero memory (for speed).
Tried memset() and also below test code. Still it bugchecked.
PIO_REMOVE_LOCK pIoRemLock1 = kmem_zalloc(sizeof(*pIoRemLock1), KM_SLEEP);
if (!pIoRemLock1) {
return (0);
}
IoInitializeRemoveLock(pIoRemLock1, NULL, 0, 0);
if (STATUS_SUCCESS != IoAcquireRemoveLock(pIoRemLock1, NULL)) {
return (0);
}
IoReleaseRemoveLockAndWait(pIoRemLock1, NULL);
//mimic kmem_zalloc() has returned same cached object
//kmem_free(pIoRemLock1, sizeof(*pIoRemLock1));
//pIoRemLock1 = kmem_zalloc(sizeof(*pIoRemLock1), KM_SLEEP);
bzero(pIoRemLock1, sizeof(*pIoRemLock1));
IoInitializeRemoveLock(pIoRemLock1, NULL, 0, 0);
In Bugcheck summary Arg2: description suggests that DV creates a shadow remove lock structure internally for tracking the actual remove lock.
5: kd> !pool ffffe70244be9570
Pool page ffffe70244be9570 region is Nonpaged pool
ffffe70244be90b0 size: c0 previous size: 0 (Allocated) WfpL
ffffe70244be9170 size: c0 previous size: 0 (Allocated) WfpL
ffffe70244be9230 size: c0 previous size: 0 (Allocated) WfpL
ffffe70244be92f0 size: c0 previous size: 0 (Allocated) WfpL
ffffe70244be93b0 size: c0 previous size: 0 (Allocated) WfpL
ffffe70244be9470 size: c0 previous size: 0 (Allocated) VfAT
*ffffe70244be9530 size: c0 previous size: 0 (Allocated) *VfAT
Pooltag VfAT : Verifier AVL trees, Binary : nt!Vf
ffffe70244be95f0 size: c0 previous size: 0 (Allocated) WfpL
Tested by allocating using ExAllocatePoolWithTag(). When same memory was re-allocated, DV did not complain. I will test more and create a PR.