pixelb / fslint

Linux file system lint checker/cleaner
319 stars 72 forks source link

Improve duplicate process time - Safe vs Fast #141

Open didierga opened 6 years ago

didierga commented 6 years ago

Under my understanding, at this time fslint does a double check for duplicate using md5sum then sha1sum to avoid md5sun collisions.

This double check is time consuming and in some case, depending of the amount and of the "value" of the files, I will prefer to have a faster single check mode with no sha1sum pass.

So I suggest to implement two modes: "Safe" the default one with double check and "Fast" with single check.

pixelb commented 6 years ago

Agreed, the double checking could probably be done more cleverly

emergie commented 6 years ago

I have the same problem.

Right now I'm in a process of sorting about 40T of data on spinning rust. fslint is a great help, but for my needs md5&sha1 verification is an overkill.

I've created PR https://github.com/pixelb/fslint/pull/145 with a change that allows the user to tune accuracy/safeness of duplicate verification to suitable level.