pixelb / fslint

Linux file system lint checker/cleaner
319 stars 72 forks source link

It takes too long to run #58

Open pixelb opened 9 years ago

pixelb commented 9 years ago

Original issue 59 created by pixelb on 2010-11-20T18:42:37.000Z:

I run FSlint often on my backup drive, and it takes very long time because it calculates the hash every time.

Maybe it should have a feature to save hashes for known files to save time.

I use FSlint 2.40 on Ubuntu 10.04

pixelb commented 9 years ago

Comment #1 originally posted by pixelb on 2010-11-20T20:59:25.000Z:

Perhaps I could do this with extended attributes, though I would really like if I could mark the xattr to be auto removed if the data is modified.

Does your backup drive have many hardlinked files (snapshots)? As otherwise only files of the same size are checksummed. There is a related issue, where hardlinks are checksummed multiple times.

pixelb commented 9 years ago

Comment #2 originally posted by pixelb on 2010-11-21T01:24:55.000Z:

Thanks for the quick response.

Yes, I use hardlinks a lot. It's a backup drive- so whenever I move something in my /home, it's copied by rsync as a separate file. Also, I use the same backup folder for two different computers- many files are common between the two.

Then I use FSlint to clean up the drive. As you can imagine, deleting the new files isn't practical. As a result, I have many hardlinks.

pixelb commented 9 years ago

Comment #3 originally posted by pixelb on 2010-11-21T01:26:27.000Z:

Thanks for the quick response.

Yes, I use hardlinks a lot. It's a backup drive- so whenever I move something in my /home, it's copied by rsync as a separate file. Also, I use the same backup folder for two different computers- many files are common between the two.

Then I use FSlint to clean up the drive. As you can imagine, deleting the new files isn't practical. As a result, I have many hardlinks.

pixelb commented 9 years ago

Comment #4 originally posted by pixelb on 2011-03-11T22:16:28.000Z:

For hard links, can't you take a stat and use the inode number?

pixelb commented 9 years ago

Comment #5 originally posted by pixelb on 2011-03-11T23:35:53.000Z:

The issue here is I would like to skip checksumming an inode we've already processed. But also when reporting and merging, we must deal with separate duplicate groups. Consider the following 4 duplicate files:

name inode

file1 1 file2 1

file3 2 file4 2

This is a little tricky to do scalably and requires a rewrite of the base findup script

pixelb commented 9 years ago

Comment #6 originally posted by pixelb on 2012-08-18T18:17:42.000Z:

Perhaps store md5sum and sha1sum compatible checksum files in each directory then it will be available if the directory is moved. I think additional information can be put as comments to the lines