Open Wikinaut opened 5 years ago
Hello, can we talk about such a new feature? If you wish, I can explain again why rsync
is not a solution.
It's something like https://askubuntu.com/a/767988
fdupes is an excellent program to find the duplicate files but it does not list the non-duplicate files, which is what you are looking for. However, we can list the files that are not in the fdupes output using a combination of find and grep.
OK an rsync solution should work if the structure in the dest was similar to that in the source.
I.E. something like rsync -rl --dry-run --out-format="%f" --checksum Z/ X/
So I presume the structure of your source Z is different to that in dest X. I.E. you want to list files not backed up, no matter where they are in Z, so that you can copy them to the appropriate location in X etc.
So you want the equivalent of the following, but with more efficient handling of unique file sizes etc:
$ SRC=Z/; DST=X/
$ find $SRC $DST -type f | xargs md5sum | sed "\| $DST|p" |
sort | uniq -w32 -u | cut -d' ' -f3
One could avoid the overhead of scanning and checksumming $DST if it was not updated between fslint dedupe runs. In that case fslint could write and index of size,checksum,name which could be used directly in the process above
Yes, the structure is different, or may be different, so we have to "search" for the file hash.
I also found this proposal for "fdupes" https://github.com/adrianlopezroche/fdupes/issues/19
It would be good to save the hash/parse/analyze information of a specific fdupes run, in order to compare later this "virtual"files tree with a real file tree.
Currently I run the suggested sequence from https://askubuntu.com/a/767988 (see above):
to list the files which are unique to backup
(Z in my example), i. e. which are in backup
but not in documents
. [My use case is vice versa: to look for files which are not yet somewhere in the "backup"]
fdupes -r backup/ documents/ > dup.txt
find backup/ -type f | grep -Fxvf dup.txt
I wish to have a feature which makes intelligent use of the checksum/hashes of the huge "backup" drive X so that - when I connect a smaller drive Z to my computer - so that I can quickly list all those files which are
This is a "one-way" check. I don't want to have the huge list of differences. I only want to know those files from Z which for one reason or another have not been copied (or later moved) to drive X, on any directory there. So basically, it's a checksum/hash issue.