Attempting to export pool with its opaquely underlying media removed breaks ZFS systemwide

sempervictus commented 7 years ago

System information

Type	Version/Name
Distribution Name	Arch
Distribution Version	Current
Linux Kernel	4.9.39
Architecture	x64
ZFS Version	master + crypto
SPL Version	master + crypto

Describe the problem you're observing

When calling zpool export on a pool which has lost its backing media (in this case a thumb drive with a dm-crypt volume on it, with the volume being the sole backing VDEV for the pool), the command blocks, subsequently blocking all invocations of zpool systemwide and producing this stack trace:

Aug 02 14:57:23 unknown kernel: INFO: task zpool:7041 blocked for more than 120 seconds.
Aug 02 14:57:23 unknown kernel:       Tainted: P           OE   4.9.39-1-sv #1
Aug 02 14:57:23 unknown kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 02 14:57:23 unknown kernel: zpool           D    0  7041   6464 0x00000084
Aug 02 14:57:23 unknown kernel:  ffffffff810a93a0 0000000000000000 ffff880835b6c000 ffff8806b1ac4d00
Aug 02 14:57:23 unknown kernel:  ffff88083f2d6b40 ffffc90061803b90 ffffffff81837dda ffffffff814008e3
Aug 02 14:57:23 unknown kernel:  ffff88083ac34d00 ffffc90061803bb8 ffff8806b1ac4d00 ffff88052f5a4230
Aug 02 14:57:23 unknown kernel: Call Trace:
Aug 02 14:57:23 unknown kernel:  [<ffffffff810a93a0>] ? switched_to_idle+0x20/0x20
Aug 02 14:57:23 unknown kernel:  [<ffffffff81837dda>] ? __schedule+0x24a/0x6e0
Aug 02 14:57:23 unknown kernel:  [<ffffffff814008e3>] ? __list_add+0x33/0x60
Aug 02 14:57:23 unknown kernel:  [<ffffffff818382b4>] schedule+0x44/0x90
Aug 02 14:57:23 unknown kernel:  [<ffffffffa0e8c139>] cv_wait_common+0x129/0x140 [spl]
Aug 02 14:57:23 unknown kernel:  [<ffffffff810bd3d0>] ? prepare_to_wait_event+0x110/0x110
Aug 02 14:57:23 unknown kernel:  [<ffffffffa0e8c16f>] __cv_wait+0x1f/0x30 [spl]
Aug 02 14:57:23 unknown kernel:  [<ffffffffa124c3d0>] txg_wait_synced+0xd0/0x110 [zfs]
Aug 02 14:57:23 unknown kernel:  [<ffffffffa123b148>] spa_export_common.part.22+0x2e8/0x3b0 [zfs]
Aug 02 14:57:23 unknown kernel:  [<ffffffff811d7a73>] ? kfree+0x183/0x1c0
Aug 02 14:57:23 unknown kernel:  [<ffffffffa123b297>] spa_export+0x47/0x60 [zfs]
Aug 02 14:57:23 unknown kernel:  [<ffffffffa127ca82>] zfs_ioc_pool_export+0x32/0x40 [zfs]
Aug 02 14:57:23 unknown kernel:  [<ffffffffa127b089>] zfsdev_ioctl+0x229/0x510 [zfs]
Aug 02 14:57:23 unknown kernel:  [<ffffffff8121b590>] do_vfs_ioctl+0xc0/0x7d0
Aug 02 14:57:23 unknown kernel:  [<ffffffff810013e9>] ? syscall_trace_enter+0x129/0x2b0
Aug 02 14:57:23 unknown kernel:  [<ffffffff8121bd2c>] sys_ioctl+0x8c/0xa0
Aug 02 14:57:23 unknown kernel:  [<ffffffff8100191f>] do_syscall_64+0x7f/0x1b0
Aug 02 14:57:23 unknown kernel:  [<ffffffff8183d465>] entry_SYSCALL64_slow_path+0x25/0x25

From a naive reading of the stack trace, looks like spa_export_common doesn't deal with this sort of failure. If the ioctls are serialized, then hanging like this would logically block all further invocations.

The crypt volume which acts as the VDEV is still "logically present" in the system, and cannot be dropped while it has consumers (the reference held by the zpool vdev). Unless spa_export_common can be taught to deal with faulty but not failed disks (those accepting IOs but returning nothing), i'm not sure this can be addressed in the export code.

The use case will be legitimate for a while yet. While ZFS crypto can address this to a degree, it stores metadata in the clear, and isn't ready in terms of integration with init systems for decryption at boot, user managers for encrypted homes, crypttab, or grub. DM-crypt aside, i could see this happening with NFS-exported files backing pools (virtualization for instance), or with caching tiers losing their backing in all sorts of applications.

All that in mind, is there an alternate path by which we can permit the ioctl interface to accept commands to other zpools/functions while we have a hung pool? Can we "unhang" pools somehow once this sort of condition has been created?

sempervictus commented 7 years ago

The backing volume for the vdev does "exist" though - there's a DM volume still in-kernel, so while the physical device backing DM is gone, ZFS has no way to know that. Its a good illustration for why ZFS is both an FS and a vol manager - blockdev jenga can get confusing.

stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

openzfs / zfs

Attempting to export pool with its opaquely underlying media removed breaks ZFS systemwide #6447

System information

Describe the problem you're observing