Open dominikh opened 1 year ago
After about two hours, the hang seems to have resolved itself.
It might also be worth noting that the system currently has 1669 snapshots, and zfs list -t snapshot
takes ~5 seconds to complete. If there were any runtime exponential in the number of snapshots, (1669 * 1669) * (5s / 1669)
would come out to 2h20min, but that might be a sheer coincidence, and the only reason I am considering it is because dsl_dataset_snapshot_check
appears in the stacktrace.
A new set of stacktraces, from several processes:
[Thu Nov 2 17:49:55 2023] INFO: task .NET ThreadPool:2058282 blocked for more than 1351 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:.NET ThreadPool state:D stack:0 pid:2058282 ppid:1 flags:0x00000002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xfa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] dmu_tx_wait+0xcc/0x440 [zfs]
[Thu Nov 2 17:49:55 2023] dmu_tx_assign+0x162/0x510 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_dirty_inode+0x151/0x390 [zfs]
[Thu Nov 2 17:49:55 2023] zpl_dirty_inode+0x29/0x40 [zfs]
[Thu Nov 2 17:49:55 2023] __mark_inode_dirty+0x5a/0x390
[Thu Nov 2 17:49:55 2023] touch_atime+0x19f/0x1b0
[Thu Nov 2 17:49:55 2023] iterate_dir+0x110/0x170
[Thu Nov 2 17:49:55 2023] __x64_sys_getdents64+0x88/0x130
[Thu Nov 2 17:49:55 2023] ? __pfx_filldir64+0x10/0x10
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f071300b627
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007f0698a0ded8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 00007f066de88da8 RCX: 00007f071300b627
[Thu Nov 2 17:49:55 2023] RDX: 0000000000008000 RSI: 00007f05dc005c20 RDI: 000000000000016c
[Thu Nov 2 17:49:55 2023] RBP: 00007f05dc005c20 R08: 00007f052bffe000 R09: 0000000000000000
[Thu Nov 2 17:49:55 2023] R10: 00007f06997a0188 R11: 0000000000000293 R12: 00007f05dc005bf4
[Thu Nov 2 17:49:55 2023] R13: 0000000000000000 R14: 00007f051c812618 R15: 00007f05dc005bf0
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task .qbittorrent-no:930660 blocked for more than 614 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:.qbittorrent-no state:D stack:0 pid:930660 ppid:628867 flags:0x00004002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xfa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] dmu_tx_wait+0xcc/0x440 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_create+0x2b8/0xa00 [zfs]
[Thu Nov 2 17:49:55 2023] zpl_create+0xd0/0x1e0 [zfs]
[Thu Nov 2 17:49:55 2023] path_openat+0xea9/0x1160
[Thu Nov 2 17:49:55 2023] do_filp_open+0xb3/0x160
[Thu Nov 2 17:49:55 2023] do_sys_openat2+0xab/0xe0
[Thu Nov 2 17:49:55 2023] __x64_sys_openat+0x6e/0xa0
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f00b6f179d0
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007f0092ffb320 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 0000000000000042 RCX: 00007f00b6f179d0
[Thu Nov 2 17:49:55 2023] RDX: 0000000000000042 RSI: 00007f006c01c420 RDI: 00000000ffffff9c
[Thu Nov 2 17:49:55 2023] RBP: 00007f006c01c420 R08: 0000000000000000 R09: 0000000000000001
[Thu Nov 2 17:49:55 2023] R10: 00000000000081a4 R11: 0000000000000293 R12: 0000000000000000
[Thu Nov 2 17:49:55 2023] R13: 00000000211186ea R14: 00007f0092ffb3e0 R15: 0000000021120000
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task zfs:1829445 blocked for more than 1720 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:zfs state:D stack:0 pid:1829445 ppid:1829444 flags:0x00004002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] ? dsl_dataset_snapshot_check+0x3ee/0x520 [zfs]
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] io_schedule+0x46/0x70
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xaa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] txg_wait_synced_impl+0xcb/0x110 [zfs]
[Thu Nov 2 17:49:55 2023] txg_wait_synced+0x10/0x40 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task_common+0x1e8/0x2a0 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task+0x1a/0x30 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_dataset_snapshot+0x160/0x340 [zfs]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? nvt_remove_nvpair+0xce/0x140 [zfs]
[Thu Nov 2 17:49:55 2023] ? __kmem_cache_alloc_node+0x190/0x310
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __kmalloc_node+0x50/0x150
[Thu Nov 2 17:49:55 2023] ? nvt_lookup_name_type.isra.0+0x6d/0xb0 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_ioc_snapshot+0x27b/0x360 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl_common+0x478/0x9e0 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl+0x53/0xe0 [zfs]
[Thu Nov 2 17:49:55 2023] __x64_sys_ioctl+0x97/0xd0
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f1576aeb28f
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007ffd1785f5e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 0000000000005a23 RCX: 00007f1576aeb28f
[Thu Nov 2 17:49:55 2023] RDX: 00007ffd1785f660 RSI: 0000000000005a23 RDI: 0000000000000004
[Thu Nov 2 17:49:55 2023] RBP: 00007ffd17862c40 R08: 0000000000040c31 R09: 0000000000000000
[Thu Nov 2 17:49:55 2023] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd1785f660
[Thu Nov 2 17:49:55 2023] R13: 0000000000005a23 R14: 00007ffd17862c01 R15: 00007ffd17862da8
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task zfs:1898785 blocked for more than 1720 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:zfs state:D stack:0 pid:1898785 ppid:1829441 flags:0x00004002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] ? dsl_dataset_snapshot_check+0x3ee/0x520 [zfs]
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] io_schedule+0x46/0x70
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xaa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] txg_wait_synced_impl+0xcb/0x110 [zfs]
[Thu Nov 2 17:49:55 2023] txg_wait_synced+0x10/0x40 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task_common+0x1c9/0x2a0 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task+0x1a/0x30 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_dataset_snapshot+0x160/0x340 [zfs]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? nvt_remove_nvpair+0xce/0x140 [zfs]
[Thu Nov 2 17:49:55 2023] ? __kmem_cache_alloc_node+0x190/0x310
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __kmalloc_node+0x50/0x150
[Thu Nov 2 17:49:55 2023] ? nvt_lookup_name_type.isra.0+0x6d/0xb0 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_ioc_snapshot+0x27b/0x360 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl_common+0x478/0x9e0 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl+0x53/0xe0 [zfs]
[Thu Nov 2 17:49:55 2023] __x64_sys_ioctl+0x97/0xd0
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f890981428f
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007fff4f257700 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 0000000000005a23 RCX: 00007f890981428f
[Thu Nov 2 17:49:55 2023] RDX: 00007fff4f257780 RSI: 0000000000005a23 RDI: 0000000000000004
[Thu Nov 2 17:49:55 2023] RBP: 00007fff4f25ad60 R08: 0000000000020000 R09: 0000000000000000
[Thu Nov 2 17:49:55 2023] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff4f257780
[Thu Nov 2 17:49:55 2023] R13: 0000000000005a23 R14: 00007fff4f25ad01 R15: 00007fff4f25aec8
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task zfs:1898786 blocked for more than 1720 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:zfs state:D stack:0 pid:1898786 ppid:1898784 flags:0x00004002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] ? dsl_dataset_snapshot_check+0x3ee/0x520 [zfs]
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] io_schedule+0x46/0x70
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xaa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] txg_wait_synced_impl+0xcb/0x110 [zfs]
[Thu Nov 2 17:49:55 2023] txg_wait_synced+0x10/0x40 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task_common+0x1c9/0x2a0 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task+0x1a/0x30 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_dataset_snapshot+0x160/0x340 [zfs]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? nvt_remove_nvpair+0xce/0x140 [zfs]
[Thu Nov 2 17:49:55 2023] ? __kmem_cache_alloc_node+0x190/0x310
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __kmalloc_node+0x50/0x150
[Thu Nov 2 17:49:55 2023] ? nvt_lookup_name_type.isra.0+0x6d/0xb0 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_ioc_snapshot+0x27b/0x360 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl_common+0x478/0x9e0 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl+0x53/0xe0 [zfs]
[Thu Nov 2 17:49:55 2023] __x64_sys_ioctl+0x97/0xd0
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f83b592128f
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007fff4711f560 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 0000000000005a23 RCX: 00007f83b592128f
[Thu Nov 2 17:49:55 2023] RDX: 00007fff4711f5e0 RSI: 0000000000005a23 RDI: 0000000000000004
[Thu Nov 2 17:49:55 2023] RBP: 00007fff47122bc0 R08: 0000000000020000 R09: 0000000000000000
[Thu Nov 2 17:49:55 2023] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff4711f5e0
[Thu Nov 2 17:49:55 2023] R13: 0000000000005a23 R14: 00007fff47122b01 R15: 00007fff47122d28
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task zfs:1898787 blocked for more than 1720 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:zfs state:D stack:0 pid:1898787 ppid:1898783 flags:0x00004002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] ? dsl_dataset_snapshot_check+0x3ee/0x520 [zfs]
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] io_schedule+0x46/0x70
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xaa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] txg_wait_synced_impl+0xcb/0x110 [zfs]
[Thu Nov 2 17:49:55 2023] txg_wait_synced+0x10/0x40 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task_common+0x1c9/0x2a0 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_check+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] ? __pfx_dsl_dataset_snapshot_sync+0x10/0x10 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_sync_task+0x1a/0x30 [zfs]
[Thu Nov 2 17:49:55 2023] dsl_dataset_snapshot+0x160/0x340 [zfs]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? nvt_remove_nvpair+0xce/0x140 [zfs]
[Thu Nov 2 17:49:55 2023] ? __kmem_cache_alloc_node+0x190/0x310
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? spl_kmem_alloc_impl+0xf1/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __kmalloc_node+0x50/0x150
[Thu Nov 2 17:49:55 2023] ? nvt_lookup_name_type.isra.0+0x6d/0xb0 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_ioc_snapshot+0x27b/0x360 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl_common+0x478/0x9e0 [zfs]
[Thu Nov 2 17:49:55 2023] zfsdev_ioctl+0x53/0xe0 [zfs]
[Thu Nov 2 17:49:55 2023] __x64_sys_ioctl+0x97/0xd0
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f4f739db28f
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007fff50584530 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 0000000000005a23 RCX: 00007f4f739db28f
[Thu Nov 2 17:49:55 2023] RDX: 00007fff505845b0 RSI: 0000000000005a23 RDI: 0000000000000004
[Thu Nov 2 17:49:55 2023] RBP: 00007fff50587b90 R08: 0000000000020000 R09: 0000000000000000
[Thu Nov 2 17:49:55 2023] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff505845b0
[Thu Nov 2 17:49:55 2023] R13: 0000000000005a23 R14: 00007fff50587b01 R15: 00007fff50587cf8
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task cp:2097939 blocked for more than 1351 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:cp state:D stack:0 pid:2097939 ppid:2097928 flags:0x00004006
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] schedule_preempt_disabled+0x15/0x30
[Thu Nov 2 17:49:55 2023] rwsem_down_read_slowpath+0x29e/0x4f0
[Thu Nov 2 17:49:55 2023] down_read+0x48/0xa0
[Thu Nov 2 17:49:55 2023] brt_pending_add+0x200/0x320 [zfs]
[Thu Nov 2 17:49:55 2023] dmu_brt_clone+0x1fc/0x470 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_clone_range+0x92d/0xed0 [zfs]
[Thu Nov 2 17:49:55 2023] __zpl_clone_file_range.isra.0+0xe5/0x160 [zfs]
[Thu Nov 2 17:49:55 2023] do_clone_file_range+0x105/0x290
[Thu Nov 2 17:49:55 2023] vfs_clone_file_range+0x3e/0x140
[Thu Nov 2 17:49:55 2023] ioctl_file_clone+0x49/0xb0
[Thu Nov 2 17:49:55 2023] do_vfs_ioctl+0x7a/0x970
[Thu Nov 2 17:49:55 2023] __x64_sys_ioctl+0x72/0xd0
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f826ef2128f
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007fff3c6402d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f826ef2128f
[Thu Nov 2 17:49:55 2023] RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
[Thu Nov 2 17:49:55 2023] RBP: 00007fff3c640710 R08: 00007fff3c6408e0 R09: 0000000000000002
[Thu Nov 2 17:49:55 2023] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[Thu Nov 2 17:49:55 2023] R13: 0000000000000003 R14: 0000000000000004 R15: 0000000000000014
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task find:2140885 blocked for more than 1105 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:find state:D stack:0 pid:2140885 ppid:110525 flags:0x00000002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xfa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] dmu_tx_wait+0xcc/0x440 [zfs]
[Thu Nov 2 17:49:55 2023] dmu_tx_assign+0x162/0x510 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_dirty_inode+0x151/0x390 [zfs]
[Thu Nov 2 17:49:55 2023] zpl_dirty_inode+0x29/0x40 [zfs]
[Thu Nov 2 17:49:55 2023] __mark_inode_dirty+0x5a/0x390
[Thu Nov 2 17:49:55 2023] touch_atime+0x19f/0x1b0
[Thu Nov 2 17:49:55 2023] iterate_dir+0x110/0x170
[Thu Nov 2 17:49:55 2023] __x64_sys_getdents64+0x88/0x130
[Thu Nov 2 17:49:55 2023] ? __pfx_filldir64+0x10/0x10
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7f466c269627
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007ffdaecf9f98 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 00000000012c6e20 RCX: 00007f466c269627
[Thu Nov 2 17:49:55 2023] RDX: 0000000000008000 RSI: 00000000012c6e50 RDI: 0000000000000006
[Thu Nov 2 17:49:55 2023] RBP: 00000000012c6e24 R08: 0000000000000008 R09: 0000000000000000
[Thu Nov 2 17:49:55 2023] R10: 0000000000000100 R11: 0000000000000293 R12: ffffffffffffff88
[Thu Nov 2 17:49:55 2023] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] INFO: task perl:2295966 blocked for more than 245 seconds.
[Thu Nov 2 17:49:55 2023] Tainted: P O 6.5.9 #1-NixOS
[Thu Nov 2 17:49:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Nov 2 17:49:55 2023] task:perl state:D stack:0 pid:2295966 ppid:2295939 flags:0x00000002
[Thu Nov 2 17:49:55 2023] Call Trace:
[Thu Nov 2 17:49:55 2023]
[Thu Nov 2 17:49:55 2023] __schedule+0x388/0x1390
[Thu Nov 2 17:49:55 2023] schedule+0x5e/0xd0
[Thu Nov 2 17:49:55 2023] cv_wait_common+0xfa/0x130 [spl]
[Thu Nov 2 17:49:55 2023] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Nov 2 17:49:55 2023] dmu_tx_wait+0xcc/0x440 [zfs]
[Thu Nov 2 17:49:55 2023] dmu_tx_assign+0x162/0x510 [zfs]
[Thu Nov 2 17:49:55 2023] zfs_setattr+0x466/0x2400 [zfs]
[Thu Nov 2 17:49:55 2023] zpl_setattr+0x110/0x250 [zfs]
[Thu Nov 2 17:49:55 2023] notify_change+0x22f/0x4c0
[Thu Nov 2 17:49:55 2023] ? chown_common+0x222/0x230
[Thu Nov 2 17:49:55 2023] chown_common+0x222/0x230
[Thu Nov 2 17:49:55 2023] do_fchownat+0xa3/0x100
[Thu Nov 2 17:49:55 2023] __x64_sys_chown+0x20/0x30
[Thu Nov 2 17:49:55 2023] do_syscall_64+0x3e/0x90
[Thu Nov 2 17:49:55 2023] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Thu Nov 2 17:49:55 2023] RIP: 0033:0x7fe47691382b
[Thu Nov 2 17:49:55 2023] RSP: 002b:00007fff32ed91a8 EFLAGS: 00000246 ORIG_RAX: 000000000000005c
[Thu Nov 2 17:49:55 2023] RAX: ffffffffffffffda RBX: 0000000001db5fe0 RCX: 00007fe47691382b
[Thu Nov 2 17:49:55 2023] RDX: 0000000000000064 RSI: 00000000000003ea RDI: 0000000001db5fe0
[Thu Nov 2 17:49:55 2023] RBP: 0000000002528c70 R08: 0000000000000064 R09: 0000000002528c50
[Thu Nov 2 17:49:55 2023] R10: 0000000001db7798 R11: 0000000000000246 R12: 0000000000000001
[Thu Nov 2 17:49:55 2023] R13: 0000000000000064 R14: 0000000002528c68 R15: 0000000001da02a0
[Thu Nov 2 17:49:55 2023]
Oh my god, I thought my server had a hardware defect. Same problem here, thanks for sharing!
Edit: I switched my server from lts (6.1.61) to the stable kernel (6.5.9) and this problem has been solved. Seems to be a very specific case when using zfs 2.2.0 with the 6.1.61 kernel.
Edit: The problem hasn't really been solved. My system doesn't freeze anymore after some hours, howeve I/O-heavy programs freeze and lock up individual CPU threads. Those processes also can't be killed and block shutdowns until I cut the power.
Edit: After upgrading to the 6.5.9 kernel and shutting down I/O-heavy programs, my server ran for 9 hours without any errors in journalctl. Also worth mentioning: https://github.com/openzfs/zfs/issues/15275#issuecomment-1793871989
I experience an issue that shares much of the characteristics mentioned above: I/O-heavy workload, tasks freeze, can't be killed, delayed shutdowns (~15min). My versions are a bit older since I run Ubuntu 22.04 LTS. The configuration is a RAID10 here.
$ uname -a
Linux nakamoto 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
$ zfs --version
zfs-2.1.5-1ubuntu6~22.04.2
zfs-kmod-2.1.5-1ubuntu6~22.04.1
$ dpkg -l | grep -i zfs
ii libzfs4linux 2.1.5-1ubuntu6~22.04.2 amd64 OpenZFS filesystem library for Linux - general support
ii libzpool5linux 2.1.5-1ubuntu6~22.04.2 amd64 OpenZFS pool library for Linux
ii zfs-zed 2.1.5-1ubuntu6~22.04.2 amd64 OpenZFS Event Daemon
ii zfsutils-linux 2.1.5-1ubuntu6~22.04.2 amd64 command-line tools to manage OpenZFS filesystems
$ zpool status
pool: nvmepool
state: ONLINE
scan: scrub repaired 0B in 00:09:45 with 0 errors on Sun Dec 10 00:33:46 2023
config:
NAME STATE READ WRITE CKSUM
nvmepool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-WD_BLACK_SN850X_4000GB_233529800659 ONLINE 0 0 0
nvme-WD_BLACK_SN850X_4000GB_233529801559 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
nvme-WD_BLACK_SN850X_4000GB_233529802026 ONLINE 0 0 0
nvme-WD_BLACK_SN850X_4000GB_233529801290 ONLINE 0 0 0
errors: No known data errors
[44829.805525] INFO: task reth:394044 blocked for more than 1208 seconds.
[44829.805533] Tainted: P O 5.15.0-91-generic #101-Ubuntu
[44829.805536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44829.805537] task:reth state:D stack: 0 pid:394044 ppid: 1 flags:0x00004002
[44829.805541] Call Trace:
[44829.805543] <TASK>
[44829.805545] __schedule+0x24e/0x590
[44829.805550] ? write_cache_pages+0xcd/0x460
[44829.805553] schedule+0x69/0x110
[44829.805555] io_schedule+0x46/0x80
[44829.805556] wait_on_page_bit_common+0x10c/0x3d0
[44829.805559] ? filemap_invalidate_unlock_two+0x50/0x50
[44829.805561] wait_on_page_bit+0x3f/0x50
[44829.805563] wait_on_page_writeback+0x26/0x80
[44829.805565] __filemap_fdatawait_range+0x97/0x120
[44829.805567] filemap_write_and_wait_range+0x88/0xe0
[44829.805571] zpl_fsync+0x3d/0xa0 [zfs]
[44829.805656] vfs_fsync_range+0x49/0x90
[44829.805659] ? find_vma+0x1b/0x80
[44829.805662] __do_sys_msync+0x1ce/0x2a0
[44829.805664] __x64_sys_msync+0x1c/0x30
[44829.805665] do_syscall_64+0x5c/0xc0
[44829.805668] ? vfs_write+0x1d5/0x270
[44829.805671] ? fput+0x13/0x20
[44829.805673] ? __x64_sys_pwrite64+0xa6/0xc0
[44829.805674] ? exit_to_user_mode_prepare+0x37/0xb0
[44829.805677] ? syscall_exit_to_user_mode+0x35/0x50
[44829.805679] ? do_syscall_64+0x69/0xc0
[44829.805680] ? exc_page_fault+0x89/0x170
[44829.805681] entry_SYSCALL_64_after_hwframe+0x62/0xcc
[44829.805684] RIP: 0033:0x7f48a446692f
[44829.805686] RSP: 002b:00007f48709f4240 EFLAGS: 00000293 ORIG_RAX: 000000000000001a
[44829.805688] RAX: ffffffffffffffda RBX: 00007b4814002014 RCX: 00007f48a446692f
[44829.805689] RDX: 0000000000000004 RSI: 00000087274b9000 RDI: 00007b4814000000
[44829.805690] RBP: 00005584504c70b8 R08: 0000000000000000 R09: 0000000000000000
[44829.805691] R10: 00007ffd7db9d080 R11: 0000000000000293 R12: 00005584504c3360
[44829.805691] R13: 0000000062a80024 R14: 00007f48709f4520 R15: 0000000000000002
[44829.805693] </TASK>
Maybe related to #15283 and #15217 ?
Edit: I switched my server from lts (6.1.61) to the stable kernel (6.5.9) and this problem has been solved. Seems to be a very specific case when using zfs 2.2.0 with the 6.1.61 kernel.
We do have hung_task panics on 6.5.11-4-pve kernel with zfs-kmod-2.2.0-pve3 while trying to work with 1000+ snapshots
System information
Describe the problem you're observing
Linux reports that the
zfs
task is hung. Several processes (local as well as remote via NFS) block trying to access some files. However, not all accesses are blocked.Describe how to reproduce the problem
I don't have steps to reproduce the problem. The hang happened on a production system, with the following dominant actions happening:
Include any warning/errors/backtraces from the system logs