openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

Directory access blocked in D state until reboot #6918

Open tjikkun opened 6 years ago

tjikkun commented 6 years ago

System information

Type Version/Name
Distribution Name CloudLinux Server
Distribution Version 6.9
Linux Kernel Linux 2.6.32-673.26.1.lve1.4.27.el6.x86_64 #1 SMP Sun May 7 19:22:54 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Architecture x86_64
ZFS Version 0.7.1-1
SPL Version 0.7.1-1

Describe the problem you're observing

We occasionally see on a "random" one of our servers that a "random" directory gets locked. All processes trying to access it from then on end in D state.

Describe how to reproduce the problem

I have no certain way of reproducing yet. I just wait until one of our servers hits the issue.

Include any warning/errors/backtraces from the system logs

cat /proc/$pid/stack of oldest in D state:

[<ffffffffa0281ef1>] cv_wait_common+0xb1/0x130 [spl]
[<ffffffffa0281f88>] __cv_wait_io+0x18/0x20 [spl]
[<ffffffffa03f85cb>] zio_wait+0xfb/0x180 [zfs]
[<ffffffffa0337ab9>] dbuf_read+0x6e9/0x970 [zfs]
[<ffffffffa0340cc8>] dmu_buf_hold_by_dnode+0x68/0x90 [zfs]
[<ffffffffa03b5474>] zap_get_leaf_byblk+0x94/0x260 [zfs]
[<ffffffffa03b59e8>] zap_deref_leaf+0xc8/0xe0 [zfs]
[<ffffffffa03b5b20>] fzap_cursor_retrieve+0x120/0x270 [zfs]
[<ffffffffa03bc4ab>] zap_cursor_retrieve+0x13b/0x2c0 [zfs]
[<ffffffffa03e34ce>] zfs_readdir+0x16e/0x4d0 [zfs]
[<ffffffffa03fdcd3>] zpl_readdir+0x73/0xb0 [zfs]
[<ffffffff811d1530>] vfs_readdir+0xc0/0xe0
[<ffffffff811d16b9>] sys_getdents+0x89/0xf0
[<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

some other procs in D state:

[<ffffffff811c9ce3>] do_lookup+0x153/0x270
[<ffffffff811cadd0>] __link_path_walk+0x9b0/0x1190
[<ffffffff811cb81a>] path_walk+0x6a/0xe0
[<ffffffff811cbbab>] filename_lookup+0x6b/0xc0
[<ffffffff811cced9>] user_path_at+0x59/0xa0
[<ffffffff811bf520>] vfs_fstatat+0x50/0xb0
[<ffffffff811bf5bb>] vfs_stat+0x1b/0x20
[<ffffffff811bf784>] sys_newstat+0x24/0x50
[<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffff811c9ce3>] do_lookup+0x153/0x270
[<ffffffff811cadd0>] __link_path_walk+0x9b0/0x1190
[<ffffffff811cb81a>] path_walk+0x6a/0xe0
[<ffffffff811cbbab>] filename_lookup+0x6b/0xc0
[<ffffffff811cced9>] user_path_at+0x59/0xa0
[<ffffffff811bf520>] vfs_fstatat+0x50/0xb0
[<ffffffff811bf5bb>] vfs_stat+0x1b/0x20
[<ffffffff811bf784>] sys_newstat+0x24/0x50
[<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffff811d14f8>] vfs_readdir+0x88/0xe0
[<ffffffff811d16b9>] sys_getdents+0x89/0xf0
[<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

I can get other info on request, a next time I can run diagnostic commands that you would like.

stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

jjb2016 commented 4 years ago

I know this is an old issue that had gone "stale", but I think I currently have the same issue. I've posted it in the "discuss" mailing list here ...https://zfsonlinux.topicbox.com/groups/zfs-discuss/T44de0b2672c30cd0-Md52dd2c779aa07c269612421/kernel-watchdog-bug-soft-lockup-cpu6-stuck-for-23s-rm-1121909

Hope this accessible to whoever sees this?

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.