Open cyh1220 opened 4 years ago
I am afraid that none of the parameters in class Tesseract
is currently set for the second (and following) language, so this issue not only affects tessedit_do_invert
(see full list in tesseractclass.h).
For each language there is a separate Tesseract
object. Simply copying the parameter values from the first object to the second object does not work because parameters are not only set from the command line.
A simple workaround is writing all parameters into a parameter file and passing that file instead of setting the parameters on the command line. Create a file named noinvert
with a single line tessedit_do_invert 0
and pass noinvert
instead of -c tessedit_do_invert=0
to tesseract
. You can also add a second line with some invalid parameter (mytest 0
) and will see one warning for each language, so that parameter file is parsed for each language.
Hi everyone,
I'm not sure this is the default behavior or not, but I found that tessedit_do_invert=0 only works for the first one with multiple languages?
Environment
tesseract 4.1.1 leptonica-1.78.0 libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 1.2.1) : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3 Found AVX Found SSE Linux xian 2.6.32-754.el6.x86_64 #1 SMP Tue Jun 19 21:26:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Current Behavior:
Here is the testing image:
If I run
time tesseract original.png stdout -l eng --oem 1 --psm 1 -c tessedit_do_invert=1
Then I run
time tesseract original.png stdout -l eng --oem 1 --psm 1 -c tessedit_do_invert=0
Clearly tesseract run faster. Then I test inverted image (white words dark background)
I run
time tesseract inverted1.png stdout -l eng --oem 1 --psm 1 -c tessedit_do_invert=0
As expected, tesseract can not work on inverted image with tessedit_do_invert=0
But when I run with multiple languages like
time tesseract inverted1.png stdout -l chi_tra+eng --oem 1 --psm 1 -c tessedit_do_invert=0
The result is correct!!! Then I test
time tesseract inverted1.png stdout -l eng+chi_tra --oem 1 --psm 1 -c tessedit_do_invert=0
So it seems that tessedit_do_invert=0 only works for the first language? If the answer is yes, I can't get "full benefit" from tessedit_do_invert for the image with multiple languages... For example: If I run
tesseract inverted1.png stdout -l eng+chi_tra --oem 1 --psm 1 -c tessedit_do_invert=0
, only English model will not check inverted text.Expected Behavior:
If I set tessedit_do_invert=0, it means that I'm sure the image has no inverted text, so all languages should not check inverted text.