Closed GoogleCodeExporter closed 9 years ago
Nick seems to have found a setting that might help!
On Fri, Jun 01, 2012 at 10:16:52AM +0100, Nick White wrote:
> On Wed, May 23, 2012 at 05:39:00PM +0100, Nick White wrote:
> > On Tue, May 22, 2012 at 05:21:23AM -0700, Galt wrote:
> > > On May 21, 2:04�am, Nick White <nick.wh...@durham.ac.uk> wrote:
> > > > I've been suffering a very similar problem with some of the text I'm
> > > > training, which has several diacritics above and below glyphs. It
> > > > isn't infrequent to find quite a few lines of garbage which are some
> > > > of the diacritics taking a line, which then causes the following and
> > > > preceding lines to not include said diacritics.
>
> I wonder, is there any way of harnessing the Tesseract API or
> configuration options to affect line height and line detection? I
> can't seem to make the above problem go away.
I finally solved this problem for my case! I found the configuration
setting 'textord_min_linesize'. With this I can assure Tesseract
that lines the size of accents should never be considered, and the
problem goes away entirely. I set the value to 2.5, twice the
default, after trial-and-error.
Nick
Original comment by g...@folkplanet.com
on 23 Jul 2012 at 5:46
I will put this hint to FAQ.
Original comment by zde...@gmail.com
on 23 Jul 2012 at 10:07
Original issue reported on code.google.com by
g...@folkplanet.com
on 18 May 2012 at 5:12Attachments: