What steps will reproduce the problem?
1. Use Tesseract 3.01 (3.00 should also do)
2. mftrain and cntrain using the two files attached
3. try to recognize any image with a small expression like: 32.453 - 67.3266
What is the expected output? What do you see instead?
The expected output is the correct recognition of the expression.
Instead what is happening is that tesseract systematically confuses the . with
- or 0. And since we are reading math expressions (even if simple) a dictionary
does not help a lot (32.4 + 5.6 and 32-4 + 506 are both valid expressions).
What version of the product are you using? On what operating system?
Tesseract 3.01
Ubuntu 11.04 64bit
Please provide any additional information below.
This problem manifests with simple expressions (only + - / *)however, if we
include operators like log() or sin() the problem becomes even more evident
because of the letters involved (even if in this case a dictionary would help
to recognize log() or sin()).
Moreover, we would like to eventually include currency symbols (like $) and
measurement units like (km, m, kg, ").
I know that tesseract is optimized for english and after that for languages
with a different structure than math expressions (not considering multi line
operators or radicals, I know that is even more complex and there few
commercial OCRs that deal with them).
But if you only could give me a methodology of how to build the sample images
for the expressions (I considered the guidelines proposed in the tesseract
documentation but for math expressions I'm definitely missing something).
Thanks a lot in advance for your help,
Original issue reported on code.google.com by luis...@gmail.com on 25 Sep 2011 at 10:06
Original issue reported on code.google.com by
luis...@gmail.com
on 25 Sep 2011 at 10:06Attachments: