markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
689 stars 75 forks source link

duperemove on 2 identical 242G directories is taking over 6 days #156

Closed ian-kelling closed 7 years ago

ian-kelling commented 7 years ago

top says duperemove is taking 100% cpu (1 core), and has been running for 8607 minutes (about 6 days of cpu time). I can't remember exactly when I started it, but it's been about a week.

duperemove --dedupe-options=block --hashfile=dr-q-a-images -dhrA q-snap/a/images treetowl-btrbk/q.20160917T112834-0700/a/images

There are 8 files which are vm images in the 2 directories to be deduped. They are on read-only snapshots. Unless there was a disk error, they are 100% identical. Here are the file sizes:

524288
727
262537728
98304
268173312
56999936
131165782016
128869007872

The end of the output looks like this:

[0xfbae80] Dedupe 6 extents (id: 5a21269f) with target: (2.1G, 128.0K), "/mnt/root/treetowl-btrbk/q.20160917T112834-0700/a/images/121G-image.qcow2"
[0xfb6400] Dedupe 119 extents (id: e47862ea) with target: (120.0G, 128.0K), "/mnt/root/treetowl-btrbk/q.20160917T112834-0700/a/images/121G-image.qcow2"
[0xfb64f0] (000068/392863) Try to dedupe extents with id 90dd2c9d
[0xfb6400] Dedupe 119 extents (id: e47862ea) with target: (120.0G, 128.0K), "/mnt/root/treetowl-btrbk/q.20160917T112834-0700/a/images/121G-image.qcow2"
[0xfb6450] Dedupe 119 extents (id: e47862ea) with target: (81.8G, 128.0K), "/mnt/root/q-snap/a/images/121G-image.qcow2"
[0xfb64f0] Dedupe 6 extents (id: 90dd2c9d) with target: (2.1G, 128.0K), "/mnt/root/q-snap/a/images/121G-image.qcow2"
[0xfbaf70] (000069/392863) Try to dedupe extents with id 86d64f43
[0xfb64a0] (000070/392863) Try to dedupe extents with id 7aa42a91
[0xfbaf70] Dedupe 11 extents (id: 86d64f43) with target: (17.4G, 128.0K), "/mnt/root/treetowl-btrbk/q.20160917T112834-0700/a/images/121G-image.qcow2"

The hashfile is 300 mb, I've copied it to a server assuming it might be useful to understand whats going on: https://iankelling.org/IbOvevFibnujokEu

version info:

I can break into it with gdb and see what the stack trace looks like usually if needed.

markfasheh commented 7 years ago

Can you do a sysrq-t? (echo t > /proc/sysrq-trigger) and give me the output of the duperemove processes (at least). If you see them in fiemap than it's probably btrfs going crazy over extent references.

ian-kelling commented 7 years ago

Not sure how to find the output of the relevant process in that output, so here is a tarball with 5 complete outputs over a minute or two:

https://iankelling.org/dupremove-kernel-tasks.tar.gz

Here are the relevant pids from pstree -p

          ├─sshd(2960)─┬─sshd(14914)───bash(14919)───duperemove(20662)─┬─{pool}(20763)
           │            │                                               ├─{pool}(20764)
           │            │                                               ├─{pool}(20765)
           │            │                                               ├─{pool}(20766)
           │            │                                               ├─{pool}(20767)
           │            │                                               ├─{pool}(20768)
           │            │                                               ├─{pool}(20769)
           │            │                                               └─{pool}(20770)
markfasheh commented 7 years ago

Ok go ahead and check out the stack if that's possible, thanks.

I didn't catch anything useful in the logs (lots of missed lines somehow) but also I'm not on my normal computer at the moment.

ian-kelling commented 7 years ago

the main proc is not stopping for gdb, so I tried something else. So I looped over all the pids and got stacks from /proc/pid/stack, 20 times over 10 second intervals, dumped it into 20 files:

https://iankelling.org/duperemove-kernel-traces.tar.gz

It does look like fiemap.

Here is one of the samples:

20662
[<ffffffff81087ea6>] ptrace_stop+0x166/0x2a0
[<ffffffff810896ef>] get_signal+0x34f/0x620
[<ffffffff8102c4e6>] do_signal+0x36/0x6e0
[<ffffffff810a33e3>] try_to_wake_up+0x43/0x370
[<ffffffff81086647>] recalc_sigpending+0x17/0x50
[<ffffffff81003305>] exit_to_usermode_loop+0x85/0xc0
[<ffffffff81003c05>] syscall_return_slowpath+0x45/0x50
[<ffffffff815c67fe>] system_call_fast_compare_end+0x94/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20763
[<ffffffff811d143e>] cache_grow+0x16e/0x220
[<ffffffffc0370373>] btrfs_get_token_64+0x103/0x120 [btrfs]
[<ffffffffc03b7e11>] __add_prelim_ref.part.9+0x31/0xd0 [btrfs]
[<ffffffffc03b8737>] __add_keyed_refs.isra.12+0x2b7/0x2d0 [btrfs]
[<ffffffffc03b98db>] find_parent_nodes+0x62b/0x740 [btrfs]
[<ffffffffc03b9c9f>] btrfs_check_shared+0xff/0x1c0 [btrfs]
[<ffffffffc037b6a5>] extent_fiemap+0x2b5/0x580 [btrfs]
[<ffffffffc035c200>] btrfs_get_extent_fiemap+0x0/0x1e0 [btrfs]
[<ffffffff81204e35>] do_vfs_ioctl+0x445/0x5c0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20764
[<ffffffff81087ea6>] ptrace_stop+0x166/0x2a0
[<ffffffff810896ef>] get_signal+0x34f/0x620
[<ffffffff8102c4e6>] do_signal+0x36/0x6e0
[<ffffffff81204e59>] do_vfs_ioctl+0x469/0x5c0
[<ffffffff81003305>] exit_to_usermode_loop+0x85/0xc0
[<ffffffff81003c05>] syscall_return_slowpath+0x45/0x50
[<ffffffff815c67fe>] system_call_fast_compare_end+0x94/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20765
[<ffffffffc0375ad2>] lock_extent_bits+0x192/0x1e0 [btrfs]
[<ffffffff810ba7e0>] autoremove_wake_function+0x0/0x40
[<ffffffffc037b58e>] extent_fiemap+0x19e/0x580 [btrfs]
[<ffffffff8132e239>] list_del+0x9/0x30
[<ffffffff810ba490>] remove_wait_queue+0x20/0x30
[<ffffffffc035c200>] btrfs_get_extent_fiemap+0x0/0x1e0 [btrfs]
[<ffffffff81204e35>] do_vfs_ioctl+0x445/0x5c0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20766
[<ffffffffc0375ad2>] lock_extent_bits+0x192/0x1e0 [btrfs]
[<ffffffff810ba7e0>] autoremove_wake_function+0x0/0x40
[<ffffffffc037b58e>] extent_fiemap+0x19e/0x580 [btrfs]
[<ffffffffc035c200>] btrfs_get_extent_fiemap+0x0/0x1e0 [btrfs]
[<ffffffff81204e35>] do_vfs_ioctl+0x445/0x5c0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20767
[<ffffffff81087ea6>] ptrace_stop+0x166/0x2a0
[<ffffffff810896ef>] get_signal+0x34f/0x620
[<ffffffff8102c4e6>] do_signal+0x36/0x6e0
[<ffffffff811f06be>] vfs_dedupe_file_range+0x13e/0x250
[<ffffffff81204c08>] do_vfs_ioctl+0x218/0x5c0
[<ffffffff81003305>] exit_to_usermode_loop+0x85/0xc0
[<ffffffff81003c05>] syscall_return_slowpath+0x45/0x50
[<ffffffff815c67fe>] system_call_fast_compare_end+0x94/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20768
[<ffffffffc038f8ee>] btrfs_dedupe_file_range+0x9e/0x640 [btrfs]
[<ffffffff811f0787>] vfs_dedupe_file_range+0x207/0x250
[<ffffffff81204bf6>] do_vfs_ioctl+0x206/0x5c0
[<ffffffff811f1433>] vfs_write+0x143/0x1a0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20769
[<ffffffffc038f8e2>] btrfs_dedupe_file_range+0x92/0x640 [btrfs]
[<ffffffff811f0787>] vfs_dedupe_file_range+0x207/0x250
[<ffffffff81204bf6>] do_vfs_ioctl+0x206/0x5c0
[<ffffffff811f1433>] vfs_write+0x143/0x1a0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
20770
[<ffffffffc0375ad2>] lock_extent_bits+0x192/0x1e0 [btrfs]
[<ffffffff810ba7e0>] autoremove_wake_function+0x0/0x40
[<ffffffffc03794b9>] __extent_read_full_page+0x99/0x100 [btrfs]
[<ffffffffc035b770>] btrfs_get_extent+0x0/0xa90 [btrfs]
[<ffffffffc0379987>] extent_read_full_page+0x47/0x80 [btrfs]
[<ffffffffc038c62a>] gather_extent_pages+0x9a/0x1d0 [btrfs]
[<ffffffffc038fa5d>] btrfs_dedupe_file_range+0x20d/0x640 [btrfs]
[<ffffffff811f0787>] vfs_dedupe_file_range+0x207/0x250
[<ffffffff81204bf6>] do_vfs_ioctl+0x206/0x5c0
[<ffffffff811f1433>] vfs_write+0x143/0x1a0
[<ffffffff81205024>] SyS_ioctl+0x74/0x80
[<ffffffff815c6776>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff
ian-kelling commented 7 years ago

Looking at htop, i see some of the threads jump from using cpu to not using cpu, but I found the total cpu time interesting:

20662 146h
20763 44:04.15
20764 124h
20765 59:04.16
20766 44:47.21
20767 8h51:42
20768 1h51:54
20769 56:09.49
20770 7h26:53
markfasheh commented 7 years ago

Interesting - are the processes killable right now?

Btw, you can try running with --dedupe-options=nofiemap and see if you still get that hang. Actually, I wonder if running show-shared-extents against those files would show a hang.

EDIT: Just to be clear on where I'm coming from - there's a severe performance issue in btrfs fiemap right now, which is why I jumped right to that. I'm not totally sure though if that's what we're hitting yet.

ian-kelling commented 7 years ago

I assume they will die if I send kill, or certainly with kill -9. I'm in no rush, so just let me know what you want me to do. It's still spitting out log lines every second or so, like the ones in the original report.

ian-kelling commented 7 years ago

Any other advice for me to debug this? Otherwise I will cancel it and rerun run with nofiemap. It's still running.

markfasheh commented 7 years ago

Yeah give it a go with nofiemap.

If that doesn't do it but we're not actually hanging then give --dedupe-options=noblock a shot - that will reduce the # of dedupe requests by calculating extents (how we used to do it if you recall). If neither helps then we'll need some more debugging.

Thanks

markfasheh commented 7 years ago

Oh lastly, if you just need this deduped and the directories are truly identical then fdupes mode should get you going again. That said, I really appreciate any help debugging the cause of your hang :) :)

ian-kelling commented 7 years ago

So, I reran with nofiemap and it completed in less than a day, maybe a few hours. Thanks for the help.

markfasheh commented 7 years ago

Thanks for updating this Ian. I think this should have a FAQ entry in the man page so other users aren't quite as surprised. I'll try to work something up soon.