Closed matthiaskrgr closed 1 year ago
Hi Matthias, did you try the '-A' option? That should force duperemove to open readonly.
I agree that it might not make sense to open read-write. I mean, it did to me at some point obviously but maybe it's a decision to revisit :)
Hm, it seems that adding -A causes additional "had status (0) "[unknown status]"." messages in some cases.
Do you remember what stage that happened at? Sounds like dedupe stage.
Never mind, that's in process_dedupe_results() as I can see now.
Sorry to flood the bug with messages but I'm looking at the code and that you see this message is kind of weird to me. The variable being printed is 'target_status' and we check against it being 0 immediately before the print:
https://github.com/markfasheh/duperemove/blob/master/run_dedupe.c#L110
Did you compile from source? Could you check that line in your version of the code?
Interestingly enough, I hit this just now testing some dedupe code on xfs:
[0x1ba98a0] Dedupe for file "/xfs/boot/System.map-3.12.53" had status (0) "[unknown status]". [0x1ba9540] Dedupe for file "/xfs/boot/System.map-3.12.53" had status (0) "[unknown status]".
So never mind that last comment obviously this happens.
Ok that ugly print was fixed in master branch, thanks for reporting. It turned out to be a missing '='!
Hmm, I still have problems when for example deduping packfiles of git repos. For testing I just cloned duperemove two times:
Using 128K blocks
Using hash: murmur3
Using 4 threads for file hashing phase
csum: /home/matthias/tmp/repotest/duperemove1/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack [1/2] (50.00%)
csum: /home/matthias/tmp/repotest/duperemove2/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack [2/2] (100.00%)
Total files: 2
Total hashes: 12
Loading only duplicated hashes from hashfile.
Hashing completed. Calculating duplicate extents - this may take some time.
[########################################]
Search completed with no errors.
Simple read and compare of file data found 1 instances of extents that might benefit from deduplication.
Showing 2 identical extents with id 0b8ee870
Start Length Filename
0.0 764.3K "/home/matthias/tmp/repotest/duperemove1/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack"
0.0 764.3K "/home/matthias/tmp/repotest/duperemove2/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack"
Using 4 threads for dedupe phase
[0x5b7ca0] Try to dedupe extents with id 0b8ee870
[0x5b7ca0] Dedupe 1 extents (id: 0b8ee870) with target: (0.0, 764.3K), "/home/matthias/tmp/repotest/duperemove1/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack"
[0x5b7ca0] Dedupe for file "/home/matthias/tmp/repotest/duperemove2/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack" had status (-22) "Invalid argument".
Kernel processed data (excludes target files): 0.0
Comparison of extent info shows a net change in shared extents of: 0.0
Huh, doing the same thing on my end (from master):
Using 128K blocks Using hash: murmur3 Using 2 threads for file hashing phase csum: /btrfs/duperemove.git/.git/objects/pack/pack-285b20f427a11446d0a071d1c5a33c39e023492d.pack 1/2 csum: /btrfs/duperemove.git-1/.git/objects/pack/pack-285b20f427a11446d0a071d1c5a33c39e023492d.pack 2/2 Total files: 2 Total hashes: 14 Loading only duplicated hashes from hashfile. Hashing completed. Calculating duplicate extents - this may take some time. Simple read and compare of file data found 1 instances of extents that might benefit from deduplication. Showing 2 identical extents with id 83f8d379 Start Length Filename 0.0 771.9K "/btrfs/duperemove.git/.git/objects/pack/pack-285b20f427a11446d0a071d1c5a33c39e023492d.pack" 0.0 771.9K "/btrfs/duperemove.git-1/.git/objects/pack/pack-285b20f427a11446d0a071d1c5a33c39e023492d.pack" Using 2 threads for dedupe phase [0xecb050] Try to dedupe extents with id 83f8d379 [0xecb050] Dedupe 1 extents (id: 83f8d379) with target: (0.0, 771.9K), "/btrfs/duperemove.git/.git/objects/pack/pack-285b20f427a11446d0a071d1c5a33c39e023492d.pack" Kernel processed data (excludes target files): 771.9K Comparison of extent info shows a net change in shared extents of: 0.0
So I wonder what the difference between our setups is. This was on kernel v4.5.0. What's 'uname -r' give you on the machine you tested on?
What is the size of '/home/matthias/tmp/repotest/duperemove1/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack' ?
du -sb : 782648 /home/matthias/tmp/repotest/duperemove1/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack 782648 /home/matthias/tmp/repotest/duperemove2/.git/objects/pack/pack-863c44a704cd9a490ea880a8711474915248905c.pack
size, md5, sha1 and sh224 sums of both pack seem to match.
kernel is: 4.6.4-301.fc24.x86_64
EDIT: Hmm, I can dedupe the files when I launch duperemove, as root, however that should not really be necessary because they are both owned by me as nonroot :/
testcase:
* copy some sufficiently large file to test directory*
cat file1 > file2 # make a non-COW copy
chmod 400 file1 # make files readonly
chmod 400 file2
duperemove -dA file1 file2
=> " had status (-22) "Invalid argument"."
Thanks for the testcase btw.
Unless there's a real reason to open in read/write mode, I say it's a useless security risk and maybe even a resource abuse. why is this not readonly everywhere?
Yeah let me look over all the relevant code and patches... I'm reasonably sure that the kernel is doing the check of read/write mode in which case there won't be much to do in duperemove until we get that changed.
Ok, there's a detailed description of the problem in the wiki under Kernel Tasks in the Development Tasks wiki page. There's one potential solution there too. I'll paste it here to save you all the time:
[ ] We have a very messy story with respect to open modes for deduping files. For example, see
https://github.com/markfasheh/duperemove/issues/86 and
https://github.com/markfasheh/duperemove/issues/129. We're opening dedupe targets O_RDWR because the kernel insists on them being open for write, (unless the user is root). The relevant line is in vfs_dedupe_file_range()
:
} else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE))) {
info->status = -EINVAL;
This seemed like a good idea when I implemented the ioctl - we were worried about the implication of one user being able to dedupe another users files. The check is too coarse though and as a result we have many complaints about not being able to dedupe readonly files as a non-root user. The most obvious fix here would be to simply delete the check. Users could then use -A (and we need to update the documentation) to force read-only opens. Once the fix is in the wild long enough we can revert to always opening readonly.
In #86 , @markfasheh said:
Hi this is because the kernel will only allow dedupe of files open for read if you are the root user, so by default we open for write. On a readonly snapshot though, this can be a problem so -A forces duperemove to always open for read
If the user is root
, why not set -A
by default? Are there any cases where we would not want this to happen? (It would also reduce instances of #176).
Hello, I believe this issue is fixed:
0.04 [jack:/mnt/trash/test] dd if=/dev/urandom bs=1M count=25 of=file1
25+0 records in
25+0 records out
26214400 bytes (26 MB, 25 MiB) copied, 0.0655023 s, 400 MB/s
0.04 [jack:/mnt/trash/test] cp --reflink=never file1 file2
0.12 [jack:/mnt/trash/test] chmod 400 *
0.12 [jack:/mnt/trash/test] duperemove -dA file1 file2
Gathering file list...
Using 12 threads for file hashing phase
[1/2] (50.00%) csum: /mnt/trash/test/file1
[2/2] (100.00%) csum: /mnt/trash/test/file2
Total files: 2
Total extent hashes: 2
Loading only duplicated hashes from hashfile.
Found 2 identical extents.
Simple read and compare of file data found 1 instances of extents that might benefit from deduplication.
Showing 2 identical extents of length 26214400 with id 80c390cb
Start Filename
0 "/mnt/trash/test/file2"
0 "/mnt/trash/test/file1"
Using 12 threads for dedupe phase
[0x557c42b88120] (1/1) Try to dedupe extents with id 80c390cb
[0x557c42b88120] Dedupe 1 extents (id: 80c390cb) with target: (0, 26214400), "/mnt/trash/test/file2"
Comparison of extent info shows a net change in shared extents of: 52428800
The -A
option is indeed required
It seems that duperemove (executed as normal user) cannnot dedupe read-only files owned by said user. Error 13: Permission denied while opening "[...]" (write=1)
This is weird because it should not modify the files but just read the underlying extents.