jhnc / findimagedupes

Finds visually similar or duplicate images
GNU General Public License v3.0
103 stars 8 forks source link

SIGABRT from GraphicsMagick when analyzing old WMF files #13

Open ilario opened 1 year ago

ilario commented 1 year ago

With some types of WMF files, findimagedupes gets a SIGABRT via GraphicsMagick at this line:

https://github.com/jhnc/findimagedupes/blob/a787e23576b3abcecda26a36507d256652d21841/findimagedupes#L491

For example, if you run findimagedupes on the directory containing all the files downloaded from here you get a coredump:

https://telparia.com/fileFormatSamples/image/wmf/

and this is caused specifically by this file:

https://telparia.com/fileFormatSamples/image/wmf/MINN.XK4

with the description of: MINN.XK4: Windows metafile, size 23398 words, 5 objects, largest record size 0x12

jhnc commented 1 year ago

Do you mean that findimagedupes terminates? (eg. if you feed it that file followed by two others, it does not reach the other two?)

It looks like newer versions of graphicsmagick do not have this issue (eg. Ubuntu 22.04, debian sid) but I'll see if I can emulate it and improve findimagedupes' behaviour in this situation.

Thank you for the report.

ilario commented 1 year ago

Yes, it interrups in an unclean way as soon as it tries to read that old WMF file.

I am not sure if this should be fixed in GraphicsMagick or just handled here, so I reported it also on GM issues tracker: https://sourceforge.net/p/graphicsmagick/bugs/724/

Thanks!

jhnc commented 1 year ago

Given that Ubuntu and debian don't crash (1.4+really1.3.38-1ubuntu0.1 and 1.4+really1.3.41-1) I would guess the graphicsmagick end is already fixed in more recent versions.

The initial point of the try code was to catch misbehaviour of imagemagick (which was used before the switch to graphicsmagick), so if it isn't doing the job it was intended for, it certainly needs to be sorted. Belt and braces, and all that.

jhnc commented 1 year ago

please test for me:

  1. copy findimagedupes somewhere (say fid)
  2. edit the new program and immediately after the eval line add:
    use sigtrap qw( die any );
  3. create a copy of the bad wmf file under a new name (say bad1.wmf and bad2.wmf)
  4. create a copy of a good file (say good1.jpg and good2.jpg)
  5. run the new program as ./fid bad1.wmf good1.jpg bad2.wmf good2.jpg

Does the program output that good1 and good2 match as well as warning about bad1 and bad2 ?

ilario commented 1 year ago

Just checked the latest code from GraphicsMagick and effectively it closes down way more gracefully than before. See the comment here: https://sourceforge.net/p/graphicsmagick/bugs/724/#ca22

Trying what you suggested, using the installed GraphicsMagick version (the one that still have this abrupt exit when it detect an unavailable font) the fid file looks like:

my $result = eval {
                use sigtrap qw( die any );
                if ((mimetype($file)||'') =~ /^(audio|video)/) {

and the result, using images from https://telparia.com/fileFormatSamples/image/wmf/ on BASH is:

$ ./fid MINN.XK4 MOUNTAIN.WMF MINN2.XK4 MOUNTAIN2.WMF
Aborted (core dumped)

and on ZSH is:

$ ./fid MINN.XK4 MOUNTAIN.WMF MINN2.XK4 MOUNTAIN2.WMF
[1]    20832 IOT instruction (core dumped)  ./fid MINN.XK4 MOUNTAIN.WMF MINN2.XK4 MOUNTAIN2.WMF

Anyway, the version of GraphicsMagick that I have installed is also the 1.3.41, as on Ubuntu. The libwmf I have installed is 0.2.13 while on Ubuntu you should have the 0.2.12. So maybe it is failing for me and not for you because you have the needed font installed (Times-Roman) and I don't have it?

jhnc commented 1 year ago

Most peculiar.

I see from your other thread that you are using archlinux.

I set up an archlinux container and can now at least partially replicate.

I'll see if I can work out what's going on.

ilario commented 1 year ago

Thanks! It is actually possible that the issue is triggered by some compilation flags used in the packaging on Arch.

jhnc commented 1 year ago

Yes, it seems highly likely there are differences.

If I disable stderr redirection, then for findimagedupes foo MINN.XK4 foo I see on archlinux:

imagemagick problem: Exception 430: Unable to open file (foo)
perl: magick/draw.c:1777: DrawSetFont: Assertion `font_name != (const char *) NULL' failed.
Aborted

but on Ubuntu, I see:

imagemagick problem: Exception 430: Unable to open file (foo)
imagemagick problem: Exception 405: Unable to read font (/usr/share/fonts/type1/gsfonts/n021003l.pfb)
imagemagick problem: Exception 430: Unable to open file (foo)

It's the assertion failure that's causing the problem. Having not thought about it very hard, I hadn't realised that graphicsmagick is actually part of the perl process rather than being something called by it. Once graphicsmagick sends the abort signal, I don't think there's anything that can be done. When I wrote the code, I assumed the eval protected the program but it doesn't in this situation. I'll see if there's any good workaround.

jhnc commented 1 year ago

I haven't found a simple fix for this although one may well exist.

An alternative to trying find a way to block the signal would be to separate graphicsmagick processing into a separate worker process. This would make it harder for graphicsmagick to inadvertently kill findimagedupes.

I'll see if I can implement this in a way that is not too expensive. A potential free bonus if it works out is parallelising fingerprinting (#9)

ilario commented 1 year ago

Nice! In the process, did you understand why on Ubuntu has a different behaviour than on Arch? If we get this, maybe we can fix the Arch PKGBUILD script or the graphicsmagick source...