Closed james-cook closed 1 year ago
AFAIK you need to pass --keep-hardlinked for rmlint to explicitly mark hardlinks as originals. skip_hardlink
is specific to files that were marked as hardlinks in the preprocessing step, maybe there is a false negative in rm_pp_bundle_hardlinks. Are you asking for an enhancment to cp_hardlink like if original_check "$1" "$2" && ! [ "$1" -ef "$2" ]; then
?
In my command I use -S dma to select the hardlink inode to preserve, and this works fine.
My "issue" is that having looked any very many rmlint.sh files I have often seen skip_hardlink after the "original" line. This makes absolute sense to me. I had assumed this is generated instead of cp_hardlink when the inode is already identical. So I was surprised in this case to see cp_hardlinks generated instead. (I have never used --keep-hardlinked explicitly)
If you then run your "final" rmlint.sh on "critical files" in PARANOID mode (as I do since https://github.com/sahib/rmlint/issues/436) then the cp_hardlink does a cmp all over again even though the inodes are already identical.
My suggestion would be to have an extra test in cp_hardlink to compare inode values (in fact in original_check). This is the existing code:
original_check() {
if [ ! -e "$2" ]; then
printf "${COL_RED}^^^^^^ Error: original has disappeared - cancelling.....${COL_RESET}\n"
return 1
fi
if [ ! -e "$1" ]; then
printf "${COL_RED}^^^^^^ Error: duplicate has disappeared - cancelling.....${COL_RESET}\n"
return 1
fi
# Check they are not the exact same file (hardlinks allowed):
if [ "$1" = "$2" ]; then
printf "${COL_RED}^^^^^^ Error: original and duplicate point to the *same* path - cancelling.....${COL_
RESET}\n"
return 1
fi
# Do double-check if requested:
if [ -z "$DO_PARANOID_CHECK" ]; then
return 0
else
if ! check_for_equality "$1" "$2"; then
printf "${COL_RED}^^^^^^ Error: files no longer identical - cancelling.....${COL_RESET}\n"
return 1
fi
fi
}
In the above, the test they are not the exact same file contains the comment "hardlinks allowed" - interesting... what does it mean?
As can be seen, no inode check is made at any point. If not at the "top level" one could at least be placed before the call to check_for_equality. Or more universally in check_for_equality itself (not sure of ripple effects here though)
Existing code, check_for_equality:
check_for_equality() {
if [ -f "$1" ]; then
# Use the more lightweight builtin `cmp` for regular files:
cmp -s -- "$1" "$2"
else
# Fallback to `rmlint --equal` for directories:
"$RMLINT_BINARY" -p --equal -- "$1" "$2"
fi
}
An inode test could be placed before the cmp command
Which version of rmlint are you using? The develop branch had some changes last year including 5ffc2a99b6f3a9200f68079b8ca296f720331995 that affect the hardlink preprocessing code, maybe one branch works as expected and the other doesn't?
I have been able to reproduce this on both master and develop. It's a problem with --merge-directories:
$ mkdir -p hardlink-test/a
$ echo xxx >hardlink-test/a/1
$ cp -rl hardlink-test/a hardlink-test/b
$ echo yyy >hardlink-test/a/2
$ rmlint -o sh -c sh:hardlink hardlink-test -S a | grep -B1 '_hardlink '
original_cmd '/tmp/hardlink-test/a/1' # original
skip_hardlink '/tmp/hardlink-test/b/1' '/tmp/hardlink-test/a/1' # duplicate
$ rmlint -D -o sh -c sh:hardlink hardlink-test -S a | grep -B1 '_hardlink '
original_cmd '/tmp/hardlink-test/b/1' # original
cp_hardlink '/tmp/hardlink-test/a/1' '/tmp/hardlink-test/b/1' # duplicate
edit: btw, the files in your example have do in fact have different inodes, 895822 vs 809730, but their sizes match. ls -lhi
might make this more obvious (or just use stat
).
I have already run rmlint over some directories on a drive. Now I am running rmlint on the whole drive to see what remains to be linted. I'm running with "sh:hardlink".
Strangely already hardlinked files (same inode) have "cp_hardlink" instead of "skip_hardlink" generated in rmlint.sh. cp_hardlink itself does no sanity check (check to see whether inodes already the same) and does a "rm" and "ln" all over again.
For example: These files share the same inode:
I run the command:
This actually reports 69 dupes but correctly notices that these dupes have zero size (because they are hardlinks of each other):
But the generated rmlint.sh file contains:
I'm not sure if this is potentially a source of problems down the line but it does seems strange. Any ideas why skip_hardlink is not generated? Or, in fact, why ANY cp_hardlink lines are generated for these 69 files in the first place.