jhnc / findimagedupes

Finds visually similar or duplicate images
GNU General Public License v3.0
104 stars 8 forks source link

How to process the output files which paths contains spaces? The output can not be customized by the findimagedupes ? #20

Closed slrslr closed 2 months ago

slrslr commented 5 months ago

Hello,

1)

findimagedupes outputs:

/path/ file name뷰á _ .jpg /path/ file뷰š -name .jpeg

so when i want to: A) remove first file B) open all paths inside viewer "gwenview"

i can not if i am not an awk/sed poweruser. Can you please suggest how to do it? Problem is the lack of unique separator and spaces in paths. I have been asking ChatGPT, but no luck.

2)

It would be handy if there is a switch where user can define prefix, suffix and the path separator.

For example --outprefix 'gwenview "' --outpathseparator '" "' --outsuffix '"'

Btw. I would expect output paths to be by default separated with quotation marks. newline - one per line

I am sorry if this is stupid or too demanding. You have mentioned that showing usage case examples incl. more complex commands may be handy inside manual / under -h switch.

jhnc commented 5 months ago

Investigate the --program, --script and -i options.


The --program option takes the full path to a program as an argument. So, to call gwenview on each group of files, you can do something like:

findimagedupes ... -p "$(type -p gwenview)" ...

Alternatively, you can use the --script option to write the output to a file. Then edit the VIEW shell function to do more complicated actions. The default definition is:

VIEW(){
        echo "$@"
}

You could edit that to become:

VIEW)(){
        # remove first file
        echo rm "$1"
        shift

        # open the rest
        gwenview "$@"
}

You can also supply overrides using -i to have the customised script run automatically without having to save and edit manually:

findimagedupes ... -i 'VIEW(){ gwenview "$@"; }' ...

The VIEW function can perform arbitrary actions on the group of files, including deletion. But be careful! It is easy to accidentally delete something you did not intend to. I suggest moving or renaming files as a first step, rather than immediately deleting anything.

slrslr commented 5 months ago

Thank you for a detailed explanation. It helped to achieve wanted outcome. the -p or -i i have placed BEFORE ending switches (-a):

run all duplicates in a program: -p "$(type -p gwenview)"

run duplicates in a program echo (so it just prints the paths): -p "$(type -p echo)"

echo 1st of the duplicates and open the rest in gwenview: -i 'VIEW(){ echo "rm $1"; gwenview "$@"; }'

remove first of the duplicate files and view the rest: -i 'VIEW(){ rm "$1"; gwenview "$@"; }'

remove first of the duplicate files (risky, read below): -i 'VIEW(){ echo "Removing first of the duplicates."; rm "$1"; }'

report that there is a duplicate file and place 1st of the duplicates into a certain folder: -i 'VIEW(){ echo "Similar file found. Check _duplicates_to_delete folder."; mv "$1" "/_duplicates_to_delete/" 2>/dev/null; }'

I am using $1 since the first of the duplicate paths seems to be always the one from the folder that I am supplying to the findimagedupes command using switch -a -- "$folderwithduplicates" and this folder is the only one in which I want to remove duplicates. So I assume that it is safe to always remove $1 without prompt in my case (I am using -t 100% switch to match really same or VERY/too similar images!).

My full command: findimagedupes -R -q -f $HOME/findimagedupes.index -t 100% -i 'VIEW(){ echo "Removing first of the duplicates."; rm "$1"; }' -a -- "/folderwithduplicatesthatmaybedeleted" -R recursively search that folder -q quiet (do not be verbose about incompatible files) -f use signature index, that I am building right before executing previous command. The indexing command is: findimagedupes -R -q -f $HOME/findimagedupes.index --prune -n -- "/main-images-folder/" -t 100% same or too similar images -a is ensuring that only duplicates that are also in /folderwithduplicatesthatmaybedeleted are printed. Not duplicates that are only within index database (i have already dealt with these using findimagedupes -q -f $HOME/findimagedupes.index -t 100%).

jhnc commented 5 months ago

I'm glad it was helpful.

Note hat it is not guaranteed that files will be listed in any particular order. Personally, I would always rename the files, as you do in your 6th example, rather than deleting them immediately as you do in your 4th and 5th examples.

Note also that "$@" refers to all the arguments, including "$1". If you wish to exclude "$1", you need the shift from my example. I suppose you already know that rm deletes an image from the filesystem entirely, not just from the output list.

If you just want to echo the filenames, you don't need to use your second command (-p "$(type -p echo)"). Echoing is already the default behaviour.

-t 100% means that there are no bit differences in the fingerprints. It does not guarantee that the images are similar (although obviously that is what is hoped for). It is possible for very dissimilar images to have the same fingerprint. As a simple example, an entirely blue image will exactly match an entirely red image. And an image of text could easily match another image of text even when the words are completely different. Also, because of the way that matches are grouped, dissimilar images can end up together (see https://github.com/jhnc/findimagedupes/issues/12#issuecomment-1610905081)

slrslr commented 5 months ago

not guaranteed that files will be listed in any particular order

that is unfortunate and it does not seem that the VIEW variable accepts a condition and semicolons used in it: -i 'VIEW(){ ...condition.. then rm "$1"; }' so does not seem to be possible to do like: if folder contains xy, then we have right folder to delete from

UPDATE: this condition may work:

-i 'VIEW(){ if echo "$1" | grep -q "$fwd"; then echo "Removing duplicate from a folder fwd."; rm "$1"; fi }'

Needs to define that variable $fwd before the command: fwd=/folderwithduplicatesthatmaybedeleted

jhnc commented 5 months ago

In case it is not clear, VIEW is an ordinary shell function. It is nothing specific to findimagedupes. It can contain any commands that you can write in a POSIX shell script. if and test syntax are both available.

Did you look at the output file produced with --script option? The file is a very simple "skeleton" shell script that is designed to be customised if you need to do more complicated tasks. (If you edit this file, you don't need the extra quoting that the argument to the -i option requires and you don't need to try to fit things onto a single line.) I wrote findimagedupes to use shell scripts for customisation so that the user did not need to learn a new programming language. (If you want to use more powerful bash or zsh functionality such as arrays, or regular expressions, you can edit the first "shebang" line to use that shell instead or the default /bin/sh.)

Of course, this does mean you already have to know how to write shell scripts.

For general help on writing shell scripts, the findimagedupes issue queue isn't the best place to ask. I'm only one person but there is a large community of helpful people on forums such as the various Stack Exchange Q&A sites ( https://superuser.com, https://unix.stackexchange.com, https://stackoverflow.com, etc), and those sites already contain answers to many questions (and lots of example code). There are also many books and tutorials and courses if you want to learn in a more structured way.