zfs error: cannot OPERATION 'rpool/data/subvol-0815-disk-0': dataset is busy

thieleon commented 11 months ago

System information

Type	Version/Name
Distribution Name	Proxmox VE
Distribution Version	7.4-17
Kernel Version	5.15.108-1-pve
Architecture	x64
OpenZFS Version	zfs-2.1.11-pve1 / zfs-kmod-2.1.11-pve1

Describe the problem you're observing

I'm running a Proxmox Cluster with a couple of nodes that use ZFS for storage. From time to time when operating on the containers via the proxmox UI, the jobs get stuck and then after some time (hours/day/..) I cancel the job but the job seems to still run in the background. Any operation on that subvol will from thereon fail with "zsf: error: cannot rollback/remove/.. 'rpool/data/subvol-ABC-disk-0': dataset busy"

I can resolve this issue by rebooting. This is not a solution for me as we run multiple customer services on this and takes quite some down time to reboot the node of proxmox.

Describe how to reproduce the problem

Pretty hard to reproduce as this seem to occur randomly from time to time. The operations like removing a snapshot/rolling back are done pretty often, but just sometimes there run forever.

There many people also in the proxmox community with the same problem, but they only suggest to "reboot".

Include any warning/errors/backtraces from the system logs

with ps auxf | grep POOL_ID I can see that there are two jobs running and blocked state "D":

ps auxf | grep POOL_ID
root      859770  0.0  0.0   8828  4192 ?        D    15:21   0:00 zfs unmount rpool/data/subvol-POOL_ID-disk-0
root      927427  0.0  0.0   8832  4228 pts/5    D    15:27   0:00 zfs mount rpool/data/subvol-POOL_ID-disk-0

I tried killing them, but this is not working. Also lsof | grep POOL_ID is not showing anything. The output of fuser -mv POOL_PATH:

fuser -mv /rpool/data/subvol-POOL_ID-disk-0/
                     USER        PID ACCESS COMMAND
/rpool/data/subvol-101100041-disk-0:
                     root     kernel mount /rpool/data/subvol-POOL_ID-disk-0

dmesg show these errors, maybe they help:

[9773158.187745] INFO: task kcompactd0:212 blocked for more than 120 seconds.
[9773158.187752]       Tainted: P           O      5.15.108-1-pve #1
[9773158.187754] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[9773158.187756] task:kcompactd0      state:D stack:    0 pid:  212 ppid:     2 flags:0x00004000
[9773158.187760] Call Trace:
[9773158.187762]  <TASK>
[9773158.187764]  __schedule+0x34e/0x1740
[9773158.187769]  ? release_pages+0x80/0x530
[9773158.187773]  schedule+0x69/0x110
[9773158.187774]  io_schedule+0x46/0x80
[9773158.187776]  wait_on_page_bit_common+0x114/0x3e0
[9773158.187778]  ? filemap_invalidate_unlock_two+0x50/0x50
[9773158.187782]  __lock_page+0x4c/0x60
[9773158.187783]  migrate_pages+0xa63/0xe40
[9773158.187786]  ? move_freelist_tail+0xc0/0xc0
[9773158.187789]  ? isolate_freepages_block+0x420/0x420
[9773158.187791]  ? migrate_vma_check_page+0xc0/0xc0
[9773158.187793]  compact_zone+0x634/0xe80
[9773158.187794]  ? __schedule+0x356/0x1740
[9773158.187796]  ? lock_timer_base+0x3b/0xd0
[9773158.187799]  proactive_compact_node+0x82/0xd0
[9773158.187802]  kcompactd+0x21e/0x4c0
[9773158.187803]  ? wait_woken+0x70/0x70
[9773158.187806]  ? kcompactd_do_work+0x210/0x210
[9773158.187808]  kthread+0x12a/0x150
[9773158.187811]  ? set_kthread_struct+0x50/0x50
[9773158.187813]  ret_from_fork+0x22/0x30
[9773158.187816]  </TASK>
[9773158.188227] INFO: task z_zrele:133464 blocked for more than 120 seconds.
[9773158.188229]       Tainted: P           O      5.15.108-1-pve #1
[9773158.188230] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[9773158.188231] task:z_zrele         state:D stack:    0 pid:133464 ppid:     2 flags:0x00004000
[9773158.188235] Call Trace:
[9773158.188236]  <TASK>
[9773158.188237]  __schedule+0x34e/0x1740
[9773158.188239]  ? __count_memcg_events+0x40/0xa0
[9773158.188242]  schedule+0x69/0x110
[9773158.188243]  io_schedule+0x46/0x80
[9773158.188245]  wait_on_page_bit_common+0x114/0x3e0
[9773158.188247]  ? filemap_invalidate_unlock_two+0x50/0x50
[9773158.188249]  __lock_page+0x4c/0x60
[9773158.188251]  truncate_inode_pages_range+0x352/0x6a0
[9773158.188254]  truncate_pagecache+0x49/0x70
[9773158.188255]  truncate_setsize+0x27/0x60
[9773158.188258]  zpl_evict_inode+0x33/0x60 [zfs]
[9773158.188316]  evict+0xd2/0x190
[9773158.188318]  iput.part.0+0x174/0x1d0
[9773158.188320]  iput+0x1c/0x30
[9773158.188322]  zfs_rele_async_task+0xe/0x20 [zfs]
[9773158.188377]  taskq_thread+0x29f/0x4d0 [spl]
[9773158.188383]  ? wake_up_q+0x90/0x90
[9773158.188386]  ? taskq_thread_spawn+0x60/0x60 [spl]
[9773158.188390]  kthread+0x12a/0x150
[9773158.188392]  ? set_kthread_struct+0x50/0x50
[9773158.188394]  ret_from_fork+0x22/0x30
[9773158.188397]  </TASK>
[9773158.188415] INFO: task zfs:859770 blocked for more than 120 seconds.
[9773158.188417]       Tainted: P           O      5.15.108-1-pve #1
[9773158.188418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[9773158.188419] task:zfs             state:D stack:    0 pid:859770 ppid:858548 flags:0x00004000
[9773158.188421] Call Trace:
[9773158.188423]  <TASK>
[9773158.188424]  __schedule+0x34e/0x1740
[9773158.188425]  ? wake_up_process+0x15/0x20
[9773158.188428]  ? __queue_work+0x20c/0x490
[9773158.188430]  schedule+0x69/0x110
[9773158.188431]  taskq_wait_outstanding+0x90/0xd0 [spl]
[9773158.188435]  ? wait_woken+0x70/0x70
[9773158.188437]  zfs_preumount+0x4b/0x70 [zfs]
[9773158.188478]  zpl_kill_sb+0x13/0x30 [zfs]
[9773158.188515]  deactivate_locked_super+0x3b/0xb0
[9773158.188517]  deactivate_super+0x44/0x50
[9773158.188519]  cleanup_mnt+0x150/0x1d0
[9773158.188521]  __cleanup_mnt+0x12/0x20
[9773158.188522]  task_work_run+0x6d/0xb0
[9773158.188524]  exit_to_user_mode_prepare+0x1a8/0x1b0
[9773158.188526]  syscall_exit_to_user_mode+0x27/0x50
[9773158.188528]  do_syscall_64+0x69/0xc0
[9773158.188530]  ? syscall_exit_to_user_mode+0x27/0x50
[9773158.188532]  ? __x64_sys_newlstat+0x16/0x20
[9773158.188534]  ? do_syscall_64+0x69/0xc0
[9773158.188535]  ? irqentry_exit+0x1d/0x30
[9773158.188537]  ? exc_page_fault+0x89/0x170
[9773158.188539]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[9773158.188541] RIP: 0033:0x7faf5811da67
[9773158.188543] RSP: 002b:00007ffc05fc74a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[9773158.188546] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007faf5811da67
[9773158.188548] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000056113de2e050
[9773158.188549] RBP: 00007ffc05fc7510 R08: 0000000000000000 R09: 0000000000000000
[9773158.188550] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[9773158.188552] R13: 000056113de2e050 R14: 000056113de1e320 R15: 000056113de20830
[9773158.188553]  </TASK>

Any help is highly appreciated, I run out of ideas.

GregorKopka commented 10 months ago

Do you have any LVM structures on that subvol? UDEV on the host might see them and activate the volume group, blocking the /dev/zd* device and through that all zfs operations on the dataset...

thieleon commented 10 months ago

No it's just a proxmox containers file system (debian) without any LVM.

GregorKopka commented 10 months ago

Has lsof some file handles to report?

On 28 November 2023 21:28:53 GMT+03:00, thieleon @.***> wrote:

No it's just a proxmox containers file system (debian) without any LVM.

-- Reply to this email directly or view it on GitHub: https://github.com/openzfs/zfs/issues/15422#issuecomment-1830444745 You are receiving this because you commented.

Message ID: @.***>

thieleon commented 10 months ago

I can't see that now, as I restarted the system already. I will check next time for that. But as far as I can tell, there is no handle.

thieleon commented 8 months ago

Problem is back, but 'lsof' does not output anything when running

lsof /rpool/data/subvol-0815-disk-0/

any other idea.

BTW. ZFS is updated now to:

$ zfs --version
zfs-2.2.2-pve1
zfs-kmod-2.2.2-pve1`

GregorKopka commented 8 months ago

There might be something in the modern vodoo (which isolates the container from the rest of the system) which was not cleared up cleanly when tearing down the container. Was a container running from that subvol, while the commands failed? Or was (past tense) one running in between boot and the time you tried to execute these commands? Have you tried to start and shutdown the related container, to see if that maybe clears the error?

I know, it's a bit like poking around in a dense fog with a short stick...

openzfs / zfs