markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
795 stars 80 forks source link

ignores tails of files #201

Closed kilobyte closed 11 months ago

kilobyte commented 6 years ago

Turns out duperemove completely skips blocks smaller than 128KB (or whatever is configured), which results in leaving an undeduped and fragmented part of every file whose size is not an exact multiply of 128KB.

I have a backup server that holds copies of ~50 systems, 24ish snapshots of each. Several months ago I switched to zstd, I wanted to recompress old backups to zstd as well. As a test, I start with */*/bin/bash: a nice compressible binary, only a few distinct versions in the whole set. Defrag succeeded, after dedupe it (measured with compsize) took slightly more than before. Uh oh.

Then, I repeated the same with dash:

[/srv/backups]# compsize */*/bin/dash */*/*/bin/dash
Processed 857 files, 70 regular extents (1549 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       84%      3.1M         3.6M          93M
none       100%      2.4M         2.4M          34M
zlib        53%       60K         112K         224K
zstd        56%      680K         1.1M          58M
[/srv/backups]# btrfs fi def -r -czstd */*/bin/dash */*/*/bin/dash
[/srv/backups]# compsize */*/bin/dash */*/*/bin/dash
Processed 857 files, 857 regular extents (857 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       55%       51M          93M          93M
zstd        55%       51M          93M          93M

Not a single shared extent!

Common versions of dash have sizes of 113KB and 117KB, thus are below the 128KB threshold. None are small enough to fit into an inline extent which would make them ineligible for dedupe, thus there's no reason they couldn't be — and indeed, manually calling /usr/bin/btrfs-extent-same does the trick.

Not only these tails take a substantial amount of wasted space (2450 half of 128KB on the average, per file, is a lot), but also increase fragmentation as almost every file would naturally be written in contiguous space after deduplication is finished — but now that tail has to be stored separately.

Ferroin commented 6 years ago

Seeing similar behavior here, I just hadn't had a chance yet to look into what's up.

kilobyte commented 6 years ago

Obviously the pasted log misses duperemove call which I did on a different terminal, to avoid spamming out the data. It's unobvious here, but it was called for these files, for which it did nothing. For bash earlier, it deduped all extents but the tails; 857 tails together take more space than the bodies.

Ferroin commented 6 years ago

Hmm, the space usage increase may have something to do with how BTRFS does extent splitting. If the file itself is a single extent, then it's likely that all of that extent is staying allocated (instead of just the very end of it).

kilobyte commented 6 years ago

All files here have been fully defragmented and recompressed; duperemove's block size is 128KB, same as btrfs' compressed extents. For dash you can see that every file consists of exactly one extent — there's 857 files and 857 regular extents, each referenced exactly once. For bash, if any old extent from before defragmentation was left, I'd see something uncompressed or zlib-compressed, yet everything uses zstd now — so I have a few very efficiently stored bodies and many many identical (a few versions) yet unshared tails.

I see no way an extent could be split here.

markfasheh commented 6 years ago

What version did you run? Over here (master, v0.11), the following works (I make a file that's 1 megabyte plus 5 bytes):

fstest1:/btrfs # dd if=/dev/zero of=file1 bs=1 count=1048581
1048581+0 records in
1048581+0 records out
1048581 bytes (1.0 MB, 1.0 MiB) copied, 2.6081 s, 402 kB/s
fstest1:/btrfs # dd if=/dev/zero of=file2 bs=1 count=1048581
1048581+0 records in
1048581+0 records out
1048581 bytes (1.0 MB, 1.0 MiB) copied, 2.60217 s, 403 kB/s
fstest1:/btrfs # sync;sync;sync;
fstest1:/btrfs # /build/mfasheh/duperemove.git/duperemove -d file*
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 2 threads for file hashing phase
[1/2] (50.00%) csum: /btrfs/file1
[2/2] (100.00%) csum: /btrfs/file2
Total files:  2
Total hashes: 18
Loading only duplicated hashes from hashfile.
Using 2 threads for dedupe phase
[0x13844a0] (2/2) Try to dedupe extents with id 3aa1e90b
[0x1384400] (1/2) Try to dedupe extents with id e47862ea
[0x13844a0] Dedupe 1 extents (id: 3aa1e90b) with target: (1048576, 5), "/btrfs/file1"
[0x13844a0] (2/2) Try to dedupe extents with id 3aa1e90b
[0x1384400] Dedupe 8 extents (id: e47862ea) with target: (0, 131072), "/btrfs/file1"
[0x13844a0] Dedupe 1 extents (id: 3aa1e90b) with target: (1048576, 5), "/btrfs/file2"
[0x1384400] (1/2) Try to dedupe extents with id e47862ea
[0x1384400] Dedupe 8 extents (id: e47862ea) with target: (0, 131072), "/btrfs/file2"
Kernel processed data (excludes target files): 2097162
Comparison of extent info shows a net change in shared extents of: 1179658

Btw, filefrag -e shows the files as having been deduped:

fstest1:/btrfs # filefrag -e file1 
Filesystem type is: 9123683e
File size of file1 is 1048581 (257 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      31:       3456..      3487:     32:             shared
   1:       32..      63:       3456..      3487:     32:       3488: shared
   2:       64..      95:       3456..      3487:     32:       3488: shared
   3:       96..     127:       3456..      3487:     32:       3488: shared
   4:      128..     159:       3456..      3487:     32:       3488: shared
   5:      160..     191:       3456..      3487:     32:       3488: shared
   6:      192..     223:       3456..      3487:     32:       3488: shared
   7:      224..     255:       3456..      3487:     32:       3488: shared
   8:      256..     256:       3712..      3712:      1:       3488: last,shared,eof
file1: 9 extents found
fstest1:/btrfs # filefrag -e file2
Filesystem type is: 9123683e
File size of file2 is 1048581 (257 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      31:       3456..      3487:     32:             shared
   1:       32..      63:       3456..      3487:     32:       3488: shared
   2:       64..      95:       3456..      3487:     32:       3488: shared
   3:       96..     127:       3456..      3487:     32:       3488: shared
   4:      128..     159:       3456..      3487:     32:       3488: shared
   5:      160..     191:       3456..      3487:     32:       3488: shared
   6:      192..     223:       3456..      3487:     32:       3488: shared
   7:      224..     255:       3456..      3487:     32:       3488: shared
   8:      256..     256:       3712..      3712:      1:       3488: last,shared,eof
file2: 9 extents found

So we're going to have to dig deeper.

markfasheh commented 6 years ago

It didn't occur to me until just now - have you tried with --dedupe-options=noblock ? The block dedupe code creates many more extent refs than the extent-finding code.

kilobyte commented 6 years ago

--dedupe-options=noblock doesn't seem to help.

Your test case dedupes correctly for me; I found another oddity:

cp -p /bin/dash dash1
cp -p /bin/dash dash2
cp -p /bin/bash bash1
cp -p /bin/bash bash2
sync
btrfs fi def -czstd *

dash is 117336 bytes long (almost one block), bash is 1111240 (8.4 blocks). Before deduplication, there are 20 on-disk extents: 2×1 for dash and 2×9 for bash. But afterwards, there are 11 (instead of 10 if all was ok or 12 if the bug was fully reproducible): dash stays in two unique extents but bash gets completely deduped, both full and partial blocks. Something on the busy backup server must be different than this isolated testcase.

markfasheh commented 6 years ago

Thanks for testing that. Another question - are you using a hashfile in either case?

kilobyte commented 6 years ago

All invocations were stand-alone, IIRC with -Ardh in every case. Tested on amd64 armhf, kernels 4.15 4.16 4.17, only relevant patch allows -A as non-root.

JackSlateur commented 11 months ago

Hello @kilobyte

Tails smaller than the blocksize are now processed Also, whole files are processed too, regardless of their size

I believe this issue is now fixed, feel free to reopen it if not