pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.91k stars 71 forks source link

Detect changes after `fclones group` to avoid copying the wrong data #84

Open th1000s opened 2 years ago

th1000s commented 2 years ago

If files change after an fclones group run without updating the timestamps and remain the same size, then the fclones link command (and others) can lead to data loss:

$ mkdir z; cd z; echo same > 1; echo same > 2; echo abcd > Z
$ cat ?
same
same
abcd
$ fclones group . -o log 
$ cp -a Z 1   # timestamp is kept
$ fclones link < log
$ cat ?
abcd
abcd
abcd

This could be avoided by also checking that the ctime of a file is older than the start of the group run, and if not re-checking or aborting.

Maybe even add a --paranoid option to check the content bite-by-byte before acting on it. But even in this case I am not aware of any (Unix) way to guarantee exclusive write access to a file, so maybe mention that the checked data is expected to not change.