Closed ian-kelling closed 7 years ago
Can you do a sysrq-t? (echo t > /proc/sysrq-trigger
) and give me the output of the duperemove processes (at least). If you see them in fiemap than it's probably btrfs going crazy over extent references.
Not sure how to find the output of the relevant process in that output, so here is a tarball with 5 complete outputs over a minute or two:
https://iankelling.org/dupremove-kernel-tasks.tar.gz
Here are the relevant pids from pstree -p
├─sshd(2960)─┬─sshd(14914)───bash(14919)───duperemove(20662)─┬─{pool}(20763)
│ │ ├─{pool}(20764)
│ │ ├─{pool}(20765)
│ │ ├─{pool}(20766)
│ │ ├─{pool}(20767)
│ │ ├─{pool}(20768)
│ │ ├─{pool}(20769)
│ │ └─{pool}(20770)
Ok go ahead and check out the stack if that's possible, thanks.
I didn't catch anything useful in the logs (lots of missed lines somehow) but also I'm not on my normal computer at the moment.
the main proc is not stopping for gdb, so I tried something else. So I looped over all the pids and got stacks from /proc/pid/stack, 20 times over 10 second intervals, dumped it into 20 files:
https://iankelling.org/duperemove-kernel-traces.tar.gz
It does look like fiemap.
Here is one of the samples:
20662
[<ffffffff81087ea6>] ptrace_stop+0x166/0x2a0
[<ffffffff810896ef>] get_signal+0x34f/0x620
[<ffffffff8102c4e6>] do_signal+0x36/0x6e0
[<ffffffff810a33e3>] try_to_wake_up+0x43/0x370
[<ffffffff81086647>] recalc_sigpending+0x17/0x50
[<ffffffff81003305>] exit_to_usermode_loop+0x85/0xc0
[<ffffffff81003c05>] syscall_return_slowpath+0x45/0x50
[<ffffffff815c67fe>] system_call_fast_compare_end+0x94/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20763
[<ffffffff811d143e>] cache_grow+0x16e/0x220
[<ffffffffc0370373>] btrfs_get_token_64+0x103/0x120 [btrfs]
[<ffffffffc03b7e11>] __add_prelim_ref.part.9+0x31/0xd0 [btrfs]
[<ffffffffc03b8737>] __add_keyed_refs.isra.12+0x2b7/0x2d0 [btrfs]
[<ffffffffc03b98db>] find_parent_nodes+0x62b/0x740 [btrfs]
[<ffffffffc03b9c9f>] btrfs_check_shared+0xff/0x1c0 [btrfs]
[<ffffffffc037b6a5>] extent_fiemap+0x2b5/0x580 [btrfs]
[<ffffffffc035c200>] btrfs_get_extent_fiemap+0x0/0x1e0 [btrfs]
[<ffffffff81204e35>] do_vfs_ioctl+0x445/0x5c0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20764
[<ffffffff81087ea6>] ptrace_stop+0x166/0x2a0
[<ffffffff810896ef>] get_signal+0x34f/0x620
[<ffffffff8102c4e6>] do_signal+0x36/0x6e0
[<ffffffff81204e59>] do_vfs_ioctl+0x469/0x5c0
[<ffffffff81003305>] exit_to_usermode_loop+0x85/0xc0
[<ffffffff81003c05>] syscall_return_slowpath+0x45/0x50
[<ffffffff815c67fe>] system_call_fast_compare_end+0x94/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20765
[<ffffffffc0375ad2>] lock_extent_bits+0x192/0x1e0 [btrfs]
[<ffffffff810ba7e0>] autoremove_wake_function+0x0/0x40
[<ffffffffc037b58e>] extent_fiemap+0x19e/0x580 [btrfs]
[<ffffffff8132e239>] list_del+0x9/0x30
[<ffffffff810ba490>] remove_wait_queue+0x20/0x30
[<ffffffffc035c200>] btrfs_get_extent_fiemap+0x0/0x1e0 [btrfs]
[<ffffffff81204e35>] do_vfs_ioctl+0x445/0x5c0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20766
[<ffffffffc0375ad2>] lock_extent_bits+0x192/0x1e0 [btrfs]
[<ffffffff810ba7e0>] autoremove_wake_function+0x0/0x40
[<ffffffffc037b58e>] extent_fiemap+0x19e/0x580 [btrfs]
[<ffffffffc035c200>] btrfs_get_extent_fiemap+0x0/0x1e0 [btrfs]
[<ffffffff81204e35>] do_vfs_ioctl+0x445/0x5c0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20767
[<ffffffff81087ea6>] ptrace_stop+0x166/0x2a0
[<ffffffff810896ef>] get_signal+0x34f/0x620
[<ffffffff8102c4e6>] do_signal+0x36/0x6e0
[<ffffffff811f06be>] vfs_dedupe_file_range+0x13e/0x250
[<ffffffff81204c08>] do_vfs_ioctl+0x218/0x5c0
[<ffffffff81003305>] exit_to_usermode_loop+0x85/0xc0
[<ffffffff81003c05>] syscall_return_slowpath+0x45/0x50
[<ffffffff815c67fe>] system_call_fast_compare_end+0x94/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20768
[<ffffffffc038f8ee>] btrfs_dedupe_file_range+0x9e/0x640 [btrfs]
[<ffffffff811f0787>] vfs_dedupe_file_range+0x207/0x250
[<ffffffff81204bf6>] do_vfs_ioctl+0x206/0x5c0
[<ffffffff811f1433>] vfs_write+0x143/0x1a0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20769
[<ffffffffc038f8e2>] btrfs_dedupe_file_range+0x92/0x640 [btrfs]
[<ffffffff811f0787>] vfs_dedupe_file_range+0x207/0x250
[<ffffffff81204bf6>] do_vfs_ioctl+0x206/0x5c0
[<ffffffff811f1433>] vfs_write+0x143/0x1a0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20770
[<ffffffffc0375ad2>] lock_extent_bits+0x192/0x1e0 [btrfs]
[<ffffffff810ba7e0>] autoremove_wake_function+0x0/0x40
[<ffffffffc03794b9>] __extent_read_full_page+0x99/0x100 [btrfs]
[<ffffffffc035b770>] btrfs_get_extent+0x0/0xa90 [btrfs]
[<ffffffffc0379987>] extent_read_full_page+0x47/0x80 [btrfs]
[<ffffffffc038c62a>] gather_extent_pages+0x9a/0x1d0 [btrfs]
[<ffffffffc038fa5d>] btrfs_dedupe_file_range+0x20d/0x640 [btrfs]
[<ffffffff811f0787>] vfs_dedupe_file_range+0x207/0x250
[<ffffffff81204bf6>] do_vfs_ioctl+0x206/0x5c0
[<ffffffff811f1433>] vfs_write+0x143/0x1a0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
Looking at htop, i see some of the threads jump from using cpu to not using cpu, but I found the total cpu time interesting:
20662 146h
20763 44:04.15
20764 124h
20765 59:04.16
20766 44:47.21
20767 8h51:42
20768 1h51:54
20769 56:09.49
20770 7h26:53
Interesting - are the processes killable right now?
Btw, you can try running with --dedupe-options=nofiemap and see if you still get that hang. Actually, I wonder if running show-shared-extents against those files would show a hang.
EDIT: Just to be clear on where I'm coming from - there's a severe performance issue in btrfs fiemap right now, which is why I jumped right to that. I'm not totally sure though if that's what we're hitting yet.
I assume they will die if I send kill, or certainly with kill -9. I'm in no rush, so just let me know what you want me to do. It's still spitting out log lines every second or so, like the ones in the original report.
Any other advice for me to debug this? Otherwise I will cancel it and rerun run with nofiemap. It's still running.
Yeah give it a go with nofiemap.
If that doesn't do it but we're not actually hanging then give --dedupe-options=noblock a shot - that will reduce the # of dedupe requests by calculating extents (how we used to do it if you recall). If neither helps then we'll need some more debugging.
Thanks
Oh lastly, if you just need this deduped and the directories are truly identical then fdupes mode should get you going again. That said, I really appreciate any help debugging the cause of your hang :) :)
So, I reran with nofiemap and it completed in less than a day, maybe a few hours. Thanks for the help.
Thanks for updating this Ian. I think this should have a FAQ entry in the man page so other users aren't quite as surprised. I'll try to work something up soon.
top says duperemove is taking 100% cpu (1 core), and has been running for 8607 minutes (about 6 days of cpu time). I can't remember exactly when I started it, but it's been about a week.
duperemove --dedupe-options=block --hashfile=dr-q-a-images -dhrA q-snap/a/images treetowl-btrbk/q.20160917T112834-0700/a/images
There are 8 files which are vm images in the 2 directories to be deduped. They are on read-only snapshots. Unless there was a disk error, they are 100% identical. Here are the file sizes:
The end of the output looks like this:
The hashfile is 300 mb, I've copied it to a server assuming it might be useful to understand whats going on: https://iankelling.org/IbOvevFibnujokEu
version info:
I can break into it with gdb and see what the stack trace looks like usually if needed.