Closed wikey closed 9 years ago
Yes, please send me the test files that you have. strider1551@gmail.com
Thanks for sending the files. The ones you sent look exactly as you say, with black backgrounds. However, using your source images I could not reproduce your output. Let's walk through some things manually
convert "mixed.tif" "temp.ppm"
and check that temp.ppm
is visually the same as the original image.c44 -dpi 600 "temp.ppm" "temp.djvu"
and check to see what the djvu file looks like.Looks like the problem only occurs with the default csepdjvu encoder set. Switching to c44 works perfectly, as does directly running the two commands you give above.
Alright. The csepdjvu is a little more involved step-wise in what it does, but walk through the following.
convert "mixed.tif" -opaque black "temp_graphics.tif"
This image should just be the text.convert "mixed.tif" +opaque black -monochrome "temp_textual.tif"
This image should just be the graphic of the dots.cjb2 -dpi 600 "temp_textual.tif" "temp_textual.djvu"
Should be a djvu of just the text.ddjvu -format=rle -v "temp_textual.djvu" "temp_textual.rle"
convert temp_graphics.tif temp_graphics.ppm
cat temp_graphics.ppm >> temp_textual.rle
csepdjvu -d 600 "temp_textual.rle" "out.djvu"
Ok, following those steps re-creates the broken djvu on my end. Running convert "mixed.tif" -opaque black "temp_graphics.tif"
appropriately generates temp_graphics.tif
with only the graphics portions of the page but the colors are inverted. This looks like the stage where things are going wrong. The temp_textual.tif
image has a normal white background, black text, and appropriately removes the graphical portions.
-opaque color
tells convert to replace the specified color with the fill color. So rather than assuming that white is the fill color, how about we make the fill color explicit?
convert "mixed.tif" -fill white -opaque black "temp_graphics.tif"
Also, what version of ImageMagick do you have installed?
I'm using ImageMagick 6.8.9-9 Q16 x86_64 2015-01-05
. Debian Jessie.
As far as I can tell the -fill <color>
option is being completely ignored since none of the colors I tried there changed the black fill in the output. I tried a couple of the related options for convert and what seems to work is setting -colors 256
, so a full invocation is convert "mixed.tif" -opaque black -colors 256 "temp_graphics-256colors.tif"
. That produces a file with just the image portions surrounded by a white background
I did some additional tests and it looks like the problem lies somewhere between imagemagick and scantailor. I took another page image from a public domain book and ran the same imagemagick commands against the tiff as exported from GIMP and then as that source image was output from scantailor. For the scantailor-output image, imagemagick defaults to a black fill and completely ignores the -fill
option when I set it. For the GIMP source image imagemagick defaults to a white fill and I can appropriately change that to other colors with -fill
. Not sure what difference between the two images is causing this problem and nothing jumped out at me from a quick look through the imagemagick image properties for each image, but this is not my area of expertise. My instinct is to file a bug against the Debian package for imagemagick but I can also send you my new test files if you want to see if you can narrow down the problem.
Either way, at this point, this feels like a non-djvubind bug and your suggestion of switching to c44 for my color encoder has triaged things for me until the actual bug is resolved upstream so I am going to move to chasing this one with the imagemagick team unless you have other suggestions.
Test images are listed with the Debian bug if you want to take a look at them. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=797107
We cannot reproduce the issue that you reported with the latest version of ImageMagick (6.9.2-0). Your example files also don't contain any full black pixels so nothing should happen when you run convert with bookpageGimpSource.tif or with bookpageGimpSource-ScantailorProcessed.tif. It looks like the version that you are using contains a bug. I would advise you to upgrade to the latest version of ImageMagick.
Hi dlemstra, thanks for checking into this bug!
I had more details in the debian bug report, including how attempting to use -opaque white
or -opaque white -fill green
reliably works on the source image but fails on the scantailor processed images, but if this is already fixed upstream then that's probably moot.
For reference, I'm getting this bug on versions of ImageMagick from 6.8.9-9 (current Debian "stable" package) to 6.9.1-2 (currently the most recent version in Debian, including the "experimental" package repository). I guess I'll let the Debian team know this bug is fixed upstream and hopefully we'll get one of the more recent version into the package archive.
The -fill green -opaque white
issue with bookpageGimpSource-ScantailorProcessed.tif
looks to be resolved in the latest version of ImageMagick.
I just realized that that it is also possible that this is a libtiff bug. Are you using the latest version of libtiff?
Debian stable ships version 4.0.3-12.3 of libtiff5 and the version in experimental is 4.0.3-13. I get the same misbehavior with either version.
Because djvubind deliberately tries to do as little modification to the images as possible, unless this issue becomes more widespread I am not going to implement the -color
work around.
Lines 144-145 of djvubind/encode.py are the relevant commands that you can change locally if you which to use csepdjvu encoding while waiting for an upstream fix.
Running stock debian Jessie on two different machines I get the following results:
Pages that have mixed or color mode set in scantailor look normal when viewing the tif files but after djvubind processes them they end up almost completely black in the final djvu file as if the colors have been inverted. Pages marked as "color" result in fully inverted colors, including white text on a black background, while "mixed" mode ones have just the black background with no text visible. OCR seems unaffected; tesseract can run on both types of pages and produces OCR output normally.
This occurs even when re-using archived source tif files from previous projects that djvubind previously encoded without trouble. I've set/unset all the c44, cpaldjvu, and csepdjvu options in my ~/.djvubind/config file that I could find in the relevant man pages. I also tried moving the config file aside entirely in case one of my standard options was causing it, though I only modify minidjvu settings so that seemed unlikely. Nothing has resolved the problem. Bitonal pages continue to work flawlessly.
I have some single-page test and output files I can send you though github prevents me from uploading either to this ticket. Just let me know if you need them and I'll email directly.