Open endolith opened 3 years ago
> rmlint --dedupe ErFLQmkWMAMTWFu.jpeg ErFLQmkWMAMTWFu.jpg ERROR: lib/session.c:331: FIDEDUPERANGE returned error: (22): Invalid argument
Just checking, the files are on a reflinkable fs (btrfs or xfs)?
Ohhh, you know what it probably is? I already merged a bunch of files as hardlinks first. These unreflinkable files have the same inode:
Shouldn't matter
Yes, btrfs. I removed the hardlinks and it works fine now
The underlying problem is that hardlinks share metadata, while reflinks don't, so it's not possible to do an in-place dedupe without first creating a new inode.
Via shell it needs two commands:
$ echo data > original
$ ln original hardlink
$ cp --reflink=always original hardlink
cp: 'original' and 'hardlink' are the same file
$ cp --reflink=always original temp_clone
$ mv temp_clone hardlink
But it probably should be possible to convert hardlinks to reflinks if you want to... I can put in a fix so that this can be done with a single command:
$ rmlint --dedupe <original> <hardlink>
Unless someone can think of a reason why this might be a bad idea?
Alternatively I could change the reported error to something like Can't convert hardlinks to reflinks
instead of the unhelpful FIDEDUPERANGE returned error: (22): Invalid argument
Well mostly my report is about the unhelpful error message, but yes, being able to convert hardlinks into reflinks would be helpful: https://superuser.com/q/1618201/13889
Not as trivial as I expected but ready for testing if you or @sahib are interested: https://github.com/SeeSpotRun/rmlint/tree/clone_hardlinks
I haven't gone any further with this because it can't be done atomically so there is some risk that the hardlink gets deleted or renamed. I could do a bit more work to make it more robust but will wait to see if there is any more interest here.
All I cared about was the confusing error message. I'm fine with it skipping hardlinks. It would be better if it just detected them and didn't bother putting them in the list, though? Maybe there's an option for that that I missed.
Maybe there's an option for that that I missed.
Yes there is!
$ man rmlint | grep -A 6 "\-L"
-l --hardlinked (default) / --keep-hardlinked / -L --no-hardlinked
Hardlinked files are treated as duplicates by default (--hardlinked). If --keep-hardlinked is given, rmlint will not delete any files that are hardlinked to an original in their respective group. Such files will be
displayed like originals, i.e. for the default output with a "ls" in front. The reasoning here is to maximize the number of kept files, while maximizing the number of freed space: Removing hardlinks to originals
will not allocate any free space.
If --no-hardlinked is given, only one file (of a set of hardlinked files) is considered, all the others are ignored; this means, they are not deleted and also not even shown in the output. The "highest ranked" of the
set is the one that is considered.
Example:
$ mkdir test
$ dd if=/dev/urandom of=test/orig bs=1k count=8
$ cp test/orig test/copy
$ cp --reflink=always test/orig test/reflink
$ ln test/orig test/hardlink
$ rmlint test -o pretty -S m
ls '/home/foo/Git/rmlint/test/orig'
rm '/home/foo/Git/rmlint/test/hardlink'
rm '/home/foo/Git/rmlint/test/copy'
rm '/home/foo/Git/rmlint/test/reflink'
$ rmlint test -o pretty -S m --keep-hardlinked
ls '/home/foo/Git/rmlint/test/orig'
ls '/home/foo/Git/rmlint/test/hardlink'
rm '/home/foo/Git/rmlint/test/copy'
rm '/home/foo/Git/rmlint/test/reflink'
$ rmlint test -o pretty -S m --no-hardlinked
ls '/home/foo/Git/rmlint/test/orig'
rm '/home/foo/Git/rmlint/test/copy'
rm '/home/foo/Git/rmlint/test/reflink'
So I did manage to implement a reasonably atomic implementation to convert a hardlink to a reflink. Not elegant but it works.
Basically rmlint --dedupe original hardlink
will clone original
to a tempfile hardlink.XXXXXX
, then atomically rename hardlink.XXXXXX
to hardlink
. So worst case, a crash would lead to an extra hardlink.XXXXXX
file floating around.
Merged into https://github.com/sahib/rmlint/tree/develop and closing issue.
cp --reflink can't convert hardlinks to reflinks either. We should implement atomic un-hardlinking in rmlint.sh and make it clear in the documentation that this is something rmlint will do (users might assume vanilla cp --reflink / FICLONE behavior and want to keep their hardlinks).
$ mkdir testdir
$ echo xxx >testdir/a
$ ln testdir/a testdir/b
$ rmlint -o sh:rmlint.sh -c sh:handler=reflink testdir
$ ./rmlint.sh -dxq
Keeping: /tmp/testdir/a
Reflinking to original: /tmp/testdir/b
cp: '/tmp/testdir/a' and '/tmp/testdir/b' are the same file
Done!
I still encounter the same problem when trying to reflink a hardlink, I don't know why. I am running rmlint version 2.10.2
I ran
and then ran the
rmlint.sh
, and it successfully combines a bunch of files, but also fails with this error on a bunch of others. I see no obvious difference between the file names that would throw off the command line. For example, these two files are the same:But rmlint can't combine them:
Happens in
fish
orbash
Ohhh, you know what it probably is? I already merged a bunch of files as hardlinks first. These unreflinkable files have the same inode: