patcharats / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Result Not The Same #40

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run Tesseract.exe with tiff image
2. Run dlltest.exe with same tiff image above
3. Result from 1 & 2 is different

Above procedure has been tested out in my machine running on window xp sp2.
I  thought of getting the same result, but at the end, the accuracy and
result also different. It seem like the Tesseract.exe is much more stable
than dlltest.

Original issue reported on code.google.com by slch2...@gmail.com on 12 Jul 2007 at 7:54

GoogleCodeExporter commented 9 years ago
The only difference that I know of that there should be between the outputs 
(apart
from the obvious bounding boxes in the dlltest output and plain text from
tesseract.exe) is that the dlltest version translates unrecognized characters 
to '|'
while tesseract.exe leaves them as a space. If you see anything else, please 
attach
some sample files and we can investigate.

Original comment by theraysm...@gmail.com on 13 Jul 2007 at 7:17

GoogleCodeExporter commented 9 years ago
Thanks for your reply, here is the attachment for your investigation. There are 
3
files in the archive which is the image itself and the different result after 
running
Tesseract.exe and dlltest.exe. From the result, I can see that the result from
Tessercat.exe is much more accurate. Thanks for your help.

Original comment by slch2...@gmail.com on 14 Jul 2007 at 1:35

Attachments:

GoogleCodeExporter commented 9 years ago
I'm also having this problem (A detailled version of my problem can be found on 
the 
forum)

Here's the images used for extraction.

Original comment by eri...@videotron.ca on 6 Mar 2008 at 6:37

Attachments:

GoogleCodeExporter commented 9 years ago
I'm also having the same problem. Is there any fix or workaround for this 
problem?
Thanks for your help.

Original comment by tul...@gmail.com on 25 Jan 2010 at 4:38

GoogleCodeExporter commented 9 years ago
Add me to this, exact same problem. This is with the SVN version only, tho. 
2.03 doesn't look broken.
It looks to me like Tesseract is being rewritten with Leptonica (When using 
Tesseract.exe, it clearly is Leptonica 
which is parsing the pictures), but when using the DLL or using custom code 
which passes an IMAGE (Declared 
in imgs.h), it doesn't work as expected.

This is a big problem, because code wrote using the IMAGE class is broken if 
updating the tesseract source to 
v3xx svn.

Thanks,
Pierre.

Original comment by hicksc...@gmail.com on 3 Apr 2010 at 10:35

GoogleCodeExporter commented 9 years ago
Hello,

i've made several tests about this issue, and i have figured a bit what's going 
on. i have compared a v2.04 to 
this SVN v3.19 and the most obvious change is the use of Leptonica, and the 
apparent re-modeling of the 
Base API object.
i have figured that using the api::SetPicture fails when using the old fashion 
method (The one which takes the 
buffer, image width / height, bpp / scanline width), but the new one taking 
only the PIX* works.

Re-writing the DLL and TestDLL code a bit worked for me (using SetPicture(PIX*) 
instead of the old fashion 
one).

Pierre.

Original comment by hicksc...@gmail.com on 4 Apr 2010 at 11:01

GoogleCodeExporter commented 9 years ago
dlltest/dllapi is deprecated, partly due to this problem.
Use TessBaseAPI instead.

Original comment by theraysm...@gmail.com on 20 May 2010 at 6:47

GoogleCodeExporter commented 9 years ago

Original comment by theraysm...@gmail.com on 20 May 2010 at 6:47