jvirkki / dupd

CLI utility to find duplicate files
http://www.virkki.com/dupd
GNU General Public License v3.0
112 stars 16 forks source link

Feature request: refresh database #10

Closed tspivey closed 8 years ago

tspivey commented 8 years ago

Feature request: refresh database I like the output of dupd report. However, if I'm using it to delete files, there's no way to refresh the database without doing a full scan.

If I could do that, preferably with a switch that would disable the file hashing and just stat the files to see if they exist (deleting them from the database if they don't), I could then run dupd report again to get an updated list of groups.

jvirkki commented 8 years ago

Thanks for the interest. A few times I've toyed with the idea of adding something along these lines. It'll need to be much faster than a scan to be useful, otherwise might as well rescan.

What I might do is to allow doing it on specific directories specified, that way it doesn't need to check all files (e.g. if I'm working on deduping pictures under /album, no need to rescan /movies and /src and other things).

jvirkki commented 8 years ago

I added a 'refresh' command to do this. It will remove from the sqlite database references to files which no longer exist on disk.

I haven't done exhaustive performance testing (and the initial implementation is not as efficient as it could be, doing excessive strcpying) but it is already much faster than a scan. On my largest data set where a 'dupd scan' takes over 4 hours, 'dupd refresh' takes only a minute or two.

See USAGE for caveats on using 'dupd refresh'. It is still necessary to do a rescan to discover new dups, moved or modified files, etc. But for the common case of just deleting duplicates after a scan, refresh does make that workflow much faster.