Closed GoogleCodeExporter closed 9 years ago
I wonder if this is what I was running into with my OCR product. Would like
some details.
Original comment by ScanH...@gmail.com
on 21 Mar 2007 at 6:36
One of the changes in 1.03 "improved" the x-height calculation, which resulted
in an
increase in the probability that a text line is regarded as allcaps. While this
made
overall improvement on documents that include a lot of small caps or all caps,
it
makes things worse in some cases on small amounts of text.
I am still working on a compromise solution that will restore the previous
operation
on small amounts of normal text, without compromising accuracy on smallcaps or
allcaps, (both of which tend to show up in small amounts) Unfortunately, it is
very
difficult to tell the difference between an all (or small) caps word and a
genuine
all x-height word.
Original comment by theraysm...@gmail.com
on 29 Mar 2007 at 1:37
Since vowels are so common, is it not possible to run a special check for one
of them
being capitalized if the block is smaller than some low number of pixels? If
even one
is capitalized, make the line lower-case. Also, is there any chance to make
these
types of decisions tunable via a command-line/config-file option? That would
give the
external application a chance to run it *both* ways if user specified "try
harder" or
just the default if one wanted a "speedy" result.
Tess is improving at a deeper level than I even anticipated. Thank you, Ray.
Original comment by fil...@repairfaq.org
on 16 Apr 2007 at 1:38
1.04 has some improvements is this area, but there is still work to do.
Original comment by theraysm...@gmail.com
on 17 May 2007 at 7:16
v2.0 will introduce a BOOL_VAR called textord_ocropus_mode. When set to true,
the
x-height calculation code will run as 1.03, when set to false (the default) it
will
run in a differerent mode that is better (on average), but worse for the way
ocropus
uses it (which defeats fix_xheight()).
Original comment by theraysm...@gmail.com
on 13 Jul 2007 at 1:43
Original comment by theraysm...@gmail.com
on 18 Jul 2007 at 10:23
Original issue reported on code.google.com by
tmb...@gmail.com
on 16 Mar 2007 at 12:11