Closed GoogleCodeExporter closed 9 years ago
In thinking about this some more, it is may be sufficient to just have an mode
that outputs
character X Y W H for each character, X Y W H is the rectangle that contains
the character.
Original comment by reng...@ix.netcom.com
on 25 Aug 2007 at 8:11
You can already get the information you need through the dll API if you are
working
on windows. If you are on any OS and don't mind linking statically, you can get
the
information by deriving from TessBaseAPI and copying the model of
TesseractToBoxText,
or use the new TesseractExtractResult. Alternatively if you prefer a separate
process
and a command-line API, you can modify TesseractToBoxText to optionally output
spaces, and use this command line:
tesseract image.tif output nobatch makebox, which will create output.txt in a
useful
format including bounding boxes of each character WITHOUT turning off the
chopper
(which is for training). Most likely you will need to setup a new variable and
corresponding config file to control the output of spaces, as we don't want
them for
training.
Original comment by theraysm...@gmail.com
on 6 Sep 2007 at 1:01
But where i can found this api ?
Original comment by ajay1kum...@gmail.com
on 2 Mar 2008 at 6:39
Issue 53 has been merged into this issue.
Original comment by theraysm...@gmail.com
on 30 Dec 2008 at 9:37
[deleted comment]
Fixed in 3.00 with hOCR output.
Original comment by theraysm...@gmail.com
on 20 May 2010 at 6:56
Original issue reported on code.google.com by
reng...@ix.netcom.com
on 25 Aug 2007 at 8:03