pixelb / fslint

Linux file system lint checker/cleaner
319 stars 72 forks source link

Don't hardlink files which differ in ownership/permissions. #98

Open pixelb opened 9 years ago

pixelb commented 9 years ago

Original issue 99 created by pixelb on 2014-12-09T20:44:57.000Z:

fslint: Handle identical files with differing UID/GID/permissions.

This commit slightly modifies the logic in 'findup' and 'md5sum_approx' to handle instances where identical files with different ownership and/or permissions were being hardlinked with surprising results. Files will now only be hardlinked if their MD5 and SHA1 sums match and they have the exact same UID/GID/permissions.

This was an issue when using fslint to help dedupe a backup disk that included development toolchains which were installed both as root:root in /opt as well as user:group in /home/user.

pixelb commented 9 years ago

Comment #1 originally posted by pixelb on 2014-12-09T21:18:25.000Z:

Yes this is a good point. This probably should be an option though. Also perhaps only uid/gid should matter?

Also it would be good not to rely on md5sum_approx which is currently optional, I.E. also augment the other parts of findup that just use find(1).

thanks!

pixelb commented 9 years ago

Comment #2 originally posted by pixelb on 2014-12-09T21:20:07.000Z:

Actually it kind of already is an option, as you can specify a user id filter to find, which would probably handle most cases for non root runs at least

pixelb commented 9 years ago

Comment #3 originally posted by pixelb on 2014-12-10T04:51:14.000Z:

As background, the use case that prompted this involves backing up (most of) an entire hard-drive using a variant of:

http://www.mikerubel.org/computers/rsync_snapshots/ http://www.pointsoftware.ch/en/howto-local-and-remote-snapshot-backup-using-rsync-with-hard-links/

Once the initial rsync (as root) is complete I'd like to be able to de-dupe the backup, which does contain files which are identical apart from ownership/permissions. These I need to treat as different files, and not hardlink them lest I wind up with root-owned files where they don't belong in the snapshot image.

I'll have a second look to see if a similar short-circuit is possible elsewhere. Not sure this should be a command-line option though; I'd consider having ownership/permissions changes as part of a scan & hardlink to be a BadThing(tm).

Thanks!