Closed kilobyte closed 11 months ago
Seeing similar behavior here, I just hadn't had a chance yet to look into what's up.
Obviously the pasted log misses duperemove call which I did on a different terminal, to avoid spamming out the data. It's unobvious here, but it was called for these files, for which it did nothing. For bash earlier, it deduped all extents but the tails; 857 tails together take more space than the bodies.
Hmm, the space usage increase may have something to do with how BTRFS does extent splitting. If the file itself is a single extent, then it's likely that all of that extent is staying allocated (instead of just the very end of it).
All files here have been fully defragmented and recompressed; duperemove's block size is 128KB, same as btrfs' compressed extents. For dash you can see that every file consists of exactly one extent — there's 857 files and 857 regular extents, each referenced exactly once. For bash, if any old extent from before defragmentation was left, I'd see something uncompressed or zlib-compressed, yet everything uses zstd now — so I have a few very efficiently stored bodies and many many identical (a few versions) yet unshared tails.
I see no way an extent could be split here.
What version did you run? Over here (master, v0.11), the following works (I make a file that's 1 megabyte plus 5 bytes):
fstest1:/btrfs # dd if=/dev/zero of=file1 bs=1 count=1048581
1048581+0 records in
1048581+0 records out
1048581 bytes (1.0 MB, 1.0 MiB) copied, 2.6081 s, 402 kB/s
fstest1:/btrfs # dd if=/dev/zero of=file2 bs=1 count=1048581
1048581+0 records in
1048581+0 records out
1048581 bytes (1.0 MB, 1.0 MiB) copied, 2.60217 s, 403 kB/s
fstest1:/btrfs # sync;sync;sync;
fstest1:/btrfs # /build/mfasheh/duperemove.git/duperemove -d file*
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 2 threads for file hashing phase
[1/2] (50.00%) csum: /btrfs/file1
[2/2] (100.00%) csum: /btrfs/file2
Total files: 2
Total hashes: 18
Loading only duplicated hashes from hashfile.
Using 2 threads for dedupe phase
[0x13844a0] (2/2) Try to dedupe extents with id 3aa1e90b
[0x1384400] (1/2) Try to dedupe extents with id e47862ea
[0x13844a0] Dedupe 1 extents (id: 3aa1e90b) with target: (1048576, 5), "/btrfs/file1"
[0x13844a0] (2/2) Try to dedupe extents with id 3aa1e90b
[0x1384400] Dedupe 8 extents (id: e47862ea) with target: (0, 131072), "/btrfs/file1"
[0x13844a0] Dedupe 1 extents (id: 3aa1e90b) with target: (1048576, 5), "/btrfs/file2"
[0x1384400] (1/2) Try to dedupe extents with id e47862ea
[0x1384400] Dedupe 8 extents (id: e47862ea) with target: (0, 131072), "/btrfs/file2"
Kernel processed data (excludes target files): 2097162
Comparison of extent info shows a net change in shared extents of: 1179658
Btw, filefrag -e
shows the files as having been deduped:
fstest1:/btrfs # filefrag -e file1
Filesystem type is: 9123683e
File size of file1 is 1048581 (257 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 31: 3456.. 3487: 32: shared
1: 32.. 63: 3456.. 3487: 32: 3488: shared
2: 64.. 95: 3456.. 3487: 32: 3488: shared
3: 96.. 127: 3456.. 3487: 32: 3488: shared
4: 128.. 159: 3456.. 3487: 32: 3488: shared
5: 160.. 191: 3456.. 3487: 32: 3488: shared
6: 192.. 223: 3456.. 3487: 32: 3488: shared
7: 224.. 255: 3456.. 3487: 32: 3488: shared
8: 256.. 256: 3712.. 3712: 1: 3488: last,shared,eof
file1: 9 extents found
fstest1:/btrfs # filefrag -e file2
Filesystem type is: 9123683e
File size of file2 is 1048581 (257 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 31: 3456.. 3487: 32: shared
1: 32.. 63: 3456.. 3487: 32: 3488: shared
2: 64.. 95: 3456.. 3487: 32: 3488: shared
3: 96.. 127: 3456.. 3487: 32: 3488: shared
4: 128.. 159: 3456.. 3487: 32: 3488: shared
5: 160.. 191: 3456.. 3487: 32: 3488: shared
6: 192.. 223: 3456.. 3487: 32: 3488: shared
7: 224.. 255: 3456.. 3487: 32: 3488: shared
8: 256.. 256: 3712.. 3712: 1: 3488: last,shared,eof
file2: 9 extents found
So we're going to have to dig deeper.
It didn't occur to me until just now - have you tried with --dedupe-options=noblock ? The block dedupe code creates many more extent refs than the extent-finding code.
--dedupe-options=noblock
doesn't seem to help.
Your test case dedupes correctly for me; I found another oddity:
cp -p /bin/dash dash1
cp -p /bin/dash dash2
cp -p /bin/bash bash1
cp -p /bin/bash bash2
sync
btrfs fi def -czstd *
dash is 117336 bytes long (almost one block), bash is 1111240 (8.4 blocks). Before deduplication, there are 20 on-disk extents: 2×1 for dash and 2×9 for bash. But afterwards, there are 11 (instead of 10 if all was ok or 12 if the bug was fully reproducible): dash stays in two unique extents but bash gets completely deduped, both full and partial blocks. Something on the busy backup server must be different than this isolated testcase.
Thanks for testing that. Another question - are you using a hashfile in either case?
All invocations were stand-alone, IIRC with -Ardh
in every case. Tested on amd64 armhf, kernels 4.15 4.16 4.17, only relevant patch allows -A as non-root.
Hello @kilobyte
Tails smaller than the blocksize are now processed Also, whole files are processed too, regardless of their size
I believe this issue is now fixed, feel free to reopen it if not
Turns out duperemove completely skips blocks smaller than 128KB (or whatever is configured), which results in leaving an undeduped and fragmented part of every file whose size is not an exact multiply of 128KB.
I have a backup server that holds copies of ~50 systems, 24ish snapshots of each. Several months ago I switched to zstd, I wanted to recompress old backups to zstd as well. As a test, I start with
*/*/bin/bash
: a nice compressible binary, only a few distinct versions in the whole set. Defrag succeeded, after dedupe it (measured with compsize) took slightly more than before. Uh oh.Then, I repeated the same with dash:
Not a single shared extent!
Common versions of dash have sizes of 113KB and 117KB, thus are below the 128KB threshold. None are small enough to fit into an inline extent which would make them ineligible for dedupe, thus there's no reason they couldn't be — and indeed, manually calling /usr/bin/btrfs-extent-same does the trick.
Not only these tails take a substantial amount of wasted space (2450 half of 128KB on the average, per file, is a lot), but also increase fragmentation as almost every file would naturally be written in contiguous space after deduplication is finished — but now that tail has to be stored separately.