sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.85k stars 128 forks source link

rmlint does not consider files with bad UID/GID for duplicate detection #649

Open bphd opened 4 months ago

bphd commented 4 months ago

How to avoid that check and instead concentrate on duplicates? Because there is a lot of duplicates and it seems that this error skip the duplicate check so he find no duplicate under that error

Workaround being:

This is strange. rmlint uses the getpwent() call to list all possible users (including LDAP according to man page) and checks the user ids from that.

Can you please paste the output of this oneliner (filtering sensitive and unneeded info where possible):

$ python3 -c 'import pwd, pprint; pprint.pprint(pwd.getpwall())'

I want to see if 10166 is in there (which would indicate a bug in rmlint somewhere).

As a workaround, you can run with out any UID check for now:

$ rmlint -T 'all -badids'

Originally posted by @sahib in https://github.com/sahib/rmlint/issues/433#issuecomment-691317436

But solution would be for analysis to both check UID:GID and duplicate on a same file, and then complain at script execution if a right is not good (but that shouldn't be a problem if it corrects rights). And ideally a more "normal" option to deactivate that

cebtenzzre commented 4 months ago

This seems like a really subjective preference - whether files with bad UID/GID are a more important concern than when files are duplicated. Especially because of the many ways the output of rmlint can be generated and consumed.

If you only want to use rmlint to detect duplicate files, you should always run it with -T df - that's usually what I do. The default mode is designed to be a more general way to find files with various potential issues.

bphd commented 4 months ago

This seems like a really subjective preference

Well it's called rmlint, not rmid

cebtenzzre commented 4 months ago

Well it's called rmlint, not rmid

It's also not called rmdupes. "lint" is supposed to refer anything you might not want on your filesystem, such as files that are owned by a user/group that has since been deleted and need to be updated.