Open GoogleCodeExporter opened 8 years ago
Sorry. After I have a look at hOCR.html using tesseract 3.02.02 command, I
understand why.
With spaces between two characters, hOCR shows that sometimes it is regarded as
separator, sometimes as spaces, sometimes as an empty word. So it is very hard
to know which word corresponds to which line and which boundingbox.
It seems it is better for the tesseract-android-tool to use an api for output,
so that we could know each line contains what words, and each word corresponds
to each confidence values and boundingbox.
ps. I apologize I made a mistake by claiming it should have no "-" outputs. I
also trained "-", and forgot to exclude it.
Thanks.
Original comment by CodingPo...@gmail.com
on 16 Nov 2012 at 8:08
Original issue reported on code.google.com by
CodingPo...@gmail.com
on 16 Nov 2012 at 2:39