Closed GoogleCodeExporter closed 9 years ago
Tested with paintbrush. When tried to save, paintbrush is crashed!
When checked with infanview, abnormal resolution. With help of
scanner, the present tif be scanned and then output may be tested.
Original comment by withbles...@gmail.com
on 4 Sep 2007 at 6:00
Maybe there is an other way to process mulipage-compressed tif's with
tesseract? I
prefer applications which i can run from the commandline, so i can run
everything in
batch
About the bug itselve:
First of all, did you mean "IrfanView" instead of "infanview"? I've installed
irfanview 4.0
I've openend the image with:
- Succesfull with Irfanview 4.00 (no messages what so ever)
- Succesfull with Paint Version 5.0 build 2195: Service pack 4 (only the
background
is green (i could save as bmp and tiff (background in saved tif also green)))
- "Imaging for Windows Preview" with no problems
- Gimp 2.2.12 (a warning that the resolution was meaningless)
With Gimp is saved the document without compression = "None" as test2.tif
This new image (test2.tif) also gives me this problem:
- Crash in tesseract
- Gimp opens this test2 successfull
- "Imaging for Windows Preview" succesfull
- Succesfull with Paint Version 5.0 build 2195: Service pack 4 (now the
background is
white :D )
Original comment by eywitteveen
on 5 Sep 2007 at 11:20
Attachments:
Tesseract uses 16 bits internally for pixel coordinates, so your image at 42900
pixels high is too big. While a fix is unlikely to be forthcoming soon, I might
make
it more gracefully reject such images.
You have 3 possibilities:
Convert your multipage tiff to multiple single-page tiffs and process
separately.
Change the code to cope properly with multipage tiffs and send a patch.
Wait for someone else to make the change. (It will happen eventually.)
Original comment by theraysm...@gmail.com
on 6 Sep 2007 at 12:07
So I was interested in the project and wanted to get my feet wet so I thought
that
this might be an interesting / easy (at least conceptually) problem to get a
feel for
the code. The attached diff file appears to have no significant negative
impact on
the tests provided (confer initial.summary vs change.summary) and so far as I
know
didn't cause tesseract to crash with the first test.tif provided. I say so far
as I
know because after 8hrs of running I killed it.
A brief debugging session leads me to believe the problem is that the problem
is that
you have too many blobs on one image.
If that's the case then my feeling is that I should see about adding my current
changes to the main source and going from there. I'm new to OSS and I didn't
see any
instructions on where to put the changes though so if you could point me in the
right
direction I'd appreciate it.
Original comment by ianh...@gmail.com
on 20 Sep 2007 at 3:30
Attachments:
I found the bug, and it works for this image, though not terribly well.
Original comment by ianh...@gmail.com
on 23 Sep 2007 at 6:23
Attachments:
Thank you for the great job, it looked like quite some work to find all the
references!(i didnt compile it yet, working on windoze here) I assume that this
will
be put into the upstream?
We still need to convert multipage tiff's encoded, are there currently people
interrested in this functionality, or is it recommended to use imagemagick to
do some
conversion before using tesseract?
Things i currently do with imagemagick:
- remove the compression from the tiff image
- break the image down from multiple pages to a single page
Original comment by eywitteveen
on 24 Sep 2007 at 6:31
Tesseract now (3.00) supports multipage tiffs with libtiff or leptonica.
Original comment by theraysm...@gmail.com
on 20 May 2010 at 6:57
Original issue reported on code.google.com by
eywitteveen
on 4 Sep 2007 at 6:15