Open sanwan opened 2 years ago
Ltp syscalltail failed with setxattr01 case and the phenomena is same . The action is https://github.com/juicedata/test-ci/runs/7549610269?check_suite_focus=true .
The test hangs until canceled after 6 hours. Maybe it's deadlocked between: write, fsync, fallocate, read
It looks like the process was stuck in kernel(5.15.0-1014-azure #17~20.04.1-Ubuntu SMP Thu Jun 23 20:01:51 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux). Suspicious stacks of juicefs threads:
10190
[<0>]__lock_buffer+0x35/0x40
[<0>]__buffer_migrate_page+0x192/0x2b0
[<0>]buffer_migrate_page_norefs+0x14/0x20
[<0>]move_to_new_page+0x70/0x260
[<0>]migrate_pages+0xa67/0xda0
[<0>]compact_zone+0x654/0xe80
[<0>]compact_zone_order+0xbb/0x100
[<0>]try_to_compact_pages+0xbd/0x260
[<0>]__alloc_pages_direct_compact+0x8c/0x1a0
[<0>]__alloc_pages_slowpath.constprop.0+0x44e/0xde0
[<0>]__alloc_pages+0x2c8/0x310
[<0>]alloc_pages_vma+0x204/0x260
[<0>]do_huge_pmd_anonymous_page+0x11d/0x470
[<0>]__handle_mm_fault+0x91f/0x970
[<0>]handle_mm_fault+0xcb/0x2a0
[<0>]do_user_addr_fault+0x1b1/0x640
[<0>]exc_page_fault+0x71/0x150
[<0>]asm_exc_page_fault+0x1e/0x30
10196
[<0>]rwsem_down_write_slowpath+0x21e/0x510
[<0>]vm_mmap_pgoff+0xa3/0x170
[<0>]ksys_mmap_pgoff+0x58/0x290
[<0>]__x64_sys_mmap+0x33/0x40
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
Other stacks:
10185
[<0>]futex_wait_queue_me+0xa8/0x110
[<0>]futex_wait+0x105/0x250
[<0>]do_futex+0x1b6/0x880
[<0>]__x64_sys_futex+0x7b/0x1b0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
10186
[<0>]hrtimer_nanosleep+0x9e/0x120
[<0>]__x64_sys_nanosleep+0xa7/0xd0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
10187
[<0>]futex_wait_queue_me+0xa8/0x110
[<0>]futex_wait+0x105/0x250
[<0>]do_futex+0x1b6/0x880
[<0>]__x64_sys_futex+0x7b/0x1b0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
10188
[<0>]futex_wait_queue_me+0xa8/0x110
[<0>]futex_wait+0x105/0x250
[<0>]do_futex+0x1b6/0x880
[<0>]__x64_sys_futex+0x7b/0x1b0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
10189
[<0>]futex_wait_queue_me+0xa8/0x110
[<0>]futex_wait+0x105/0x250
[<0>]do_futex+0x1b6/0x880
[<0>]__x64_sys_futex+0x7b/0x1b0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
10195
[<0>]futex_wait_queue_me+0xa8/0x110
[<0>]futex_wait+0x105/0x250
[<0>]do_futex+0x1b6/0x880
[<0>]__x64_sys_futex+0x7b/0x1b0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
15384
[<0>]futex_wait_queue_me+0xa8/0x110
[<0>]futex_wait+0x105/0x250
[<0>]do_futex+0x1b6/0x880
[<0>]__x64_sys_futex+0x7b/0x1b0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
21565
[<0>]futex_wait_queue_me+0xa8/0x110
[<0>]futex_wait+0x105/0x250
[<0>]do_futex+0x1b6/0x880
[<0>]__x64_sys_futex+0x7b/0x1b0
[<0>]do_syscall_64+0x5c/0xc0
[<0>]entry_SYSCALL_64_after_hwframe+0x44/0xae
When it happens, CPU uses ~100% iowait, and there are several ops waiting on FUSE:
$ cat /sys/fs/fuse/connections/77/waiting
2
Some links that may be related:
This kind of deadlock is also mentioned here: https://www.kernel.org/doc/html/latest/filesystems/fuse.html
The deadlock can be aborted by echo 1 > /sys/fs/fuse/connections/77/abort
Ltp fs test has the same problem . More details may refer to https://github.com/juicedata/test-ci/runs/7609689810?check_suite_focus=true .
<<
What happened: The chdir01 case failed .
What you expected to happen: The chdir01 case passed. How to reproduce it (as minimally and precisely as possible): Follow the action https://github.com/juicedata/test-ci/runs/7536721207?check_suite_focus=true or https://github.com/juicedata/test-ci/runs/7540757053?check_suite_focus=true . Anything else we need to know? No.