Open tcastelly opened 2 years ago
Hod did you install tesseract and libtesseract? What version of tessearct do you have?
Thank you for your answer.
I'm on Gnu Archlinux, I installed:
pacman -S tesseract leptonica tesseract-data-eng
tesseract 4.1.1
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.5.2 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.0
My tesseract was installed through Fedora's dnf install tesseract
command
tesseract 4.1.3
leptonica-1.81.1
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.0) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE
My tesseract command gives the expected Coco Adel
output.
Through leptess, I also get rh\n
.
Converting the image to a png changed leptess
's output slightly "Nr\n"
.
I created a new image with the same resolution and similar sized text and leptess
was able to parse it correctly.
I don't know why the command and API have different behaviour on your image. It may be worth checking to see if the command sets any additional options.
Yeah, most likely that the command line uses different set of default options :(
The default page seg mode for leptess is set to 6, which is block mode, and the default value for tesseract would be 3, which is auto.
Setting this variable manually would get the same result:
lt.set_variable(Variable::TesseditPagesegMode, "3").unwrap();
So, maybe the default value for page seq mode for leptess should set to 3 to consistent with tesseract, and also preventing someone get unexpected results.
FYI The cli set default page seg mode to PSM_AUTO:
But PSM_SINGLE_BLOCK in library.
Hello,
Thank you for this work!
I have a curious behavior, when I try to retrieve the text from the image bellow in command line:
I have as result,
But when I use the wrapper
I have:
I've tried to use the traineddata from this repository. Or nothing. But same result.
Maybe the command line use default parameters.
Thanks in advance