WeOCR server/Tesserac works better than Tesseract 2.00 standalone version

patcharats / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr

Other

0 stars 0 forks source link

WeOCR server/Tesserac works better than Tesseract 2.00 standalone version #56

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

I compared the results of Tesseract standalone version 2.00 with the WeOCR 
server/Tesseract, which use the same engine (Tesseract 2.00), but a 
different image proccessor.

What is the expected output? What do you see instead?

The results of WeOCR server (http://asv.aso.ecei.tohoku.ac.jp/tesseract/) 
are much better than the results of the standalone version of Tessearct.

What version of the product are you using? On what operating system?

Version 2.00

Provide any additional information below.

Please find attached the file of my test.

Original issue reported on code.google.com by pepey...@gmail.com on 11 Aug 2007 at 6:45

Attachments:

[Test Results.doc](https://storage.googleapis.com/google-code-attachments/tesseract-ocr/issue-56/comment-0/Test Results.doc)

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

pepeyola,
This is VERY interesting. Can you explain "with a different image proccessor".
What's the difference between this and the tess engine? Are you saying they 
glued 
together there own (or another) IP & the tess engine (whatever that is)?
Where did you learn about this? Maybe the hacker's guide?
KB

Original comment by beaumon...@gmail.com on 15 Aug 2007 at 9:17

GoogleCodeExporter commented 9 years ago

beaumont.k

You can see the details of WeOCR server in 
http://asv.aso.ecei.tohoku.ac.jp/tesseract.

I think they use its own Image Processor together with the character 
recognition 
engine of Tesseract.

For more details, you can ask to the author of the project, Professor Hideaki 
GOTO 
(http://www.sc.isc.tohoku.ac.jp/~hgot/)

Original comment by pepey...@gmail.com on 15 Aug 2007 at 11:05

GoogleCodeExporter commented 9 years ago

The image processing that goes before an OCR engine is always going to be 
critical to
its accuracy. The thresholding algorithm in tesseract 2.00 is very basic. It 
was the
best published algorithm out of those that I tested in the mid 1990s (See
http://www.hpl.hp.com/techreports/93/HPL-93-22.pdf for more information)
Unfortunately, the adaptive thresholding algorithm that was developed alongside
tesseract, which was significantly better, was not part of the open source 
release,
due to its commercial utility. There could easily be other open source or 
published
algorithms available by now, and some day one of these may find its way into 
tesseract.

Original comment by theraysm...@gmail.com on 17 Aug 2007 at 8:58

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Issue 172 has been merged into this issue.

Original comment by theraysm...@gmail.com on 30 Dec 2008 at 4:29

GoogleCodeExporter commented 9 years ago

Fixed in 3.01

Original comment by theraysm...@gmail.com on 20 May 2010 at 6:55

Changed state: Started

GoogleCodeExporter commented 9 years ago

Is this fix available in the public svn? I checked out the svn today but 
couldn't find anything, and issue 172 still outputs "white" for me.

Original comment by iainmel...@gmail.com on 11 Jul 2010 at 10:53

GoogleCodeExporter commented 9 years ago

I believe Ray intended to close this one.

Original comment by joregan on 23 Feb 2012 at 11:11

Changed state: Fixed