sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.85k stars 128 forks source link

[Questions] How to replace the original file with the duplicate with the oldest modification time? #634

Open amalgame21 opened 8 months ago

amalgame21 commented 8 months ago

Thank you for creating this utility, it is great. I am using this utility to keep all my files (mostly videos, audios and images ) centralised in one place. I usually run rmlint -pprg RANDOM_DIR_1 RANDOM_DIR_N // CENTRALISED_DIR -km to keep the CENTRALISED_DIR untouched, so if there are some files leftover in RANDOM_DIR afterwards, I put it inside CENTRALISED_DIR manually, and keep it well organized,.

However, I found that some of the files in my CENTRALISED_DIR are duplicates that have newer modification time. I want the oldest one in this directory. I know that there is an option -S m that can specify the oldest duplicates as original. However I do not know how to combine it with the above command. If I untag the CENTRALISED_DIR and apply -S m, some file inside this directory may be deleted because it may contains newer modification time. But later I do not know where it was originally located inside 'CENTRALISED_DIR' because it have many subdirectories in it, so it is very hard for me to manually move the leftover files in RANDOM_DIR to CENTRALISED_DIR

What should I do to solve this problem? thanks!

cebtenzzre commented 8 months ago

Did you figure this out?

amalgame21 commented 8 months ago

Yes, now I use reflink to do it before deleting it, which should be safer. First I use rmlint -pp -r -g -T df -S ma RANDOM_DIR_1 RANDOM_DIR_N CENTRALISED_DIR to find the oldest duplicate. Then I manually modify rmlint.sh to replace all remove_cmd with cp_reflink, then in the cp_reflink function, the lines with touch command is commented out. And then run the shell script. Lastly, I use rmlint -pp -r -g -T df -S ma RANDOM_DIR_1 RANDOM_DIR_N // CENTRALISED_DIR -km to delete all the duplicates in the untagged folder.

I was expecting appending -c sh:reflink in the above command would do it without manually modify the shell script, but it seems that it does not take care of modification time, which may generate a shell script with the original file with newer modification time.

cebtenzzre commented 8 months ago

I was expecting appending -c sh:reflink in the above command would do it without manually modify the shell script,

The order of arguments to reflink does not matter. Once two files are reflinked, they can only be told apart by their path. And the touch command is necessary to preserve the modification time, otherwise cp --reflink just sets the mtime to the current date.

amalgame21 commented 8 months ago

The order of arguments to reflink does not matter. Once two files are reflinked, they can only be told apart by their path. And the touch command is necessary to preserve the modification time, otherwise cp --reflink just sets the mtime to the current date.

In the cp_reflink function, the cp --archive --reflink=always "$2" "$1" set the mtime of $1 to be the mtime of $2, that's what I want. However, the touch commands before and after the cp command preserve the mtime of $1, I don't what that, I want the mtime of $1 from the mtime of the earlist file.

cebtenzzre commented 8 months ago

Ah, I see. You care about the order because of the way the touch command is run. I'll have to look into this.