Closed GoogleCodeExporter closed 9 years ago
Maybe you can use the generated box file, as it includes char and choords
(left,bottom,right,top). But you only get each letter and not the whole word.
Original comment by struther...@gmail.com
on 6 Aug 2007 at 11:15
That looks very promising. Is there any way of getting both the normal text
output
and the box file without running tesseract twice? I'm using:
tesseract image.tif text batch.nochop makebox
Original comment by jeffrey....@gmail.com
on 6 Aug 2007 at 11:52
i don't think this is currently included. I can't code c++ but maybe it is
possible
to add that function with a parameter at the commandline.
Boxes must be generated for both, creating boxfile and creating textouput, i
think.
Original comment by struther...@gmail.com
on 6 Aug 2007 at 1:13
OK. So my feature request should really read:
Please add option to get both the normal text output and the box file without
running
tesseract twice.
This will allow me to use tesseract's word breaks (from the normal text output)
without having to guess my own from the box file, and also to correctly
position the
text output, a word at a time, in the PDF or DjVu file at approximately the
right
font size.
Original comment by jeffrey....@gmail.com
on 6 Aug 2007 at 1:27
See also Issue 59. Somebody will get to this soon. It is quite easy. On the
other
hand, someone from ocropus may get round to hocr output soon too.
Original comment by theraysm...@gmail.com
on 6 Sep 2007 at 1:07
Isn't it simply a matter of doing it all in a batch/shell file!! Or are you
running
100's of commands per day?
Original comment by beaumon...@gmail.com
on 12 Sep 2007 at 3:01
gscan2pdf is running tesseract on the fly. It seems silly to run tesseract
twice when
it should be relatively straightforward to modify tesseract to produce the
required
output.
Original comment by jeffrey....@gmail.com
on 12 Sep 2007 at 6:18
Original comment by theraysm...@gmail.com
on 30 Dec 2008 at 9:37
Original issue reported on code.google.com by
jeffrey....@gmail.com
on 6 Aug 2007 at 7:03