kcobra / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Issues with images with different x and y resolutions #1485

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Make a tiff file by scanning a text page with resolution 200x100 and save it 
as doc.tiff
2. Run tesseract doc.tiff doc pdf
3. Check the result

- What is the expected output? What do you see instead?

One should get a proper pdf file with text. However, the pdf file has wrong y 
size, because tesseract only used the x resolution information

What version of the product are you using? On what operating system?

- Latest 3.03 RC on Linux

Please provide any additional information below.

Being able to use odd resolutions such as 200x100 is important as almost all 
documents scanned by fax machines tend to have different x and y resolutions

Original issue reported on code.google.com by sergio.c...@gmail.com on 10 Jun 2015 at 9:39

GoogleCodeExporter commented 9 years ago
Can you please provide some example image for testing?

Original comment by zde...@gmail.com on 10 Jun 2015 at 5:51

GoogleCodeExporter commented 9 years ago
Ah yes, our old friend 204 x 96 dpi standard mode fax resolution.

I didn't realize that Tesseract accepted images with differing x/y
resolutions. The PDF generation module is calling api->GetSourceYResolution() 
and there is no equivalent call to get the X resolution of the source 
material.

I might be able to work around this (no guarantees), but I need to very 
clearly understand what Tesseract is doing with bounding boxes when fed 
such images. Please attach before and after examples for my inspection to
this bug.

https://code.google.com/p/tesseract-ocr/source/browse/ccmain/thresholder.h#90

Original comment by breidenb...@gmail.com on 10 Jun 2015 at 6:15

GoogleCodeExporter commented 9 years ago

Original comment by breidenb...@gmail.com on 10 Jun 2015 at 6:45