pauldreik / rdfind

find duplicate files utility
Other
980 stars 79 forks source link

results.txt processing #110

Open parkerlreed opened 2 years ago

parkerlreed commented 2 years ago

I have a results.txt from a dry run. That was a day or two ago. Is it possible to reprocess just the txt file to see how much space would be freed if I wanted to remove the dups?

rschenk0 commented 1 year ago

I would love this, too....

nonlinearsugar commented 1 year ago

I can think of a few couple of things to do with a results file:

  1. By default, recheck the files and update the results.txt (reuse existing flag for update results)
  2. Maybe provide and an abbreviated recheck process that looks at size/datemodified/then does quickhashing? Might be too complicated.
  3. Optionally, take action (reuse existing delete/softlink/hardlink) only on the files in results
bordenc commented 1 year ago

A helpful thing might be documentation (and not code for rdfind to handle) to assist in parsing a multi-megabyte file using regular expressions or some other software. Using your case, instead of rdfind "reprocessing" a results.txt file, run a few regex patterns or import it into spreadsheet software so the file can be turned into, say, a file list to be piped into rm.

On the latter, if it can be guaranteed that results.txt only uses certain characters as delimiters, then they could be used as delimiters to turn results.txt into a character-separated file for import as a spreadsheet. People not proficient in regular expressions could use filters instead to hack and modify the file until they get the delete list they want (or, for example, change the ranking algorithm).

Obviously more hands on, but may be easier to implement and avoid bugs and feature creep.