sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

Feature suggestion: Paranoid mode - automatically delete or update xattrs if files found to be different #542

Open james-cook opened 2 years ago

james-cook commented 2 years ago

Feature request :)

IMHO it would be useful to have paranoid failures result in either:

  1. the deletion of the xattrs for each file (cheaper in the short term for the current run)
  2. the regeneration of the xattrs for each file

Why is this needed? Aren't the xattr values reliable anyway? In my specific case the context for this request is: xattrs giving strange results https://github.com/sahib/rmlint/issues/436 and https://github.com/sahib/rmlint/issues/439 - I have xattrs values which cannot be relied upon.

I.e. in the function: original_check()

# Do double-check if requested:
    if [ -z "$DO_PARANOID_CHECK" ]; then
        return 0
    else
        if [ "$(check_for_equality "$1" "$2")" -ne "0" ]; then
            echo "${COL_RED}^^^^^^ Error: files no longer identical - cancelling.....${COL_RESET}"
           # pseudo code - delete xattrs for both files here... or regenerate them
           #  xattr -d user.rmlint.blake2b.mtime "$1" 
           #  xattr -d user.rmlint.blake2b.cksum "$1" 
           #  xattr -d user.rmlint.blake2b.mtime "$2" 
           #  xattr -d user.rmlint.blake2b.cksum "$2" 
            return 1
        fi
    fi

Main reason: it seems to me that a later run of rmlint will still use the old xattrs so that the generated scripts will continue to contain false matches. If these newer rmlint.sh scripts are not run with "-p" the hard work of the previous "-p" run will be lost as will the differing files. This is a good opportunity to repair existing incorrect xattrs.

I favour just deleting the xattrs concerned here (a both sides as shown in the pseudo code), and leaving regeneration to the rmlint command (not rmlint.sh) . No need to duplicate code.

Why not just delete ALL existing xattrs and start again? Well, in my setup at least - with terabytes of already "xattr"ed files which are a kind of mostly accurate database - deletion alone is VERY slow and takes many resources. Most xattrs will be correct.