Closed sanitha-studio closed 4 years ago
Does #174 answer your question?
Thanks for the response.
BUT, It is said that whitelist and blacklist features are working from v4.1 and I made it works using PyTesseract(see my above comment).
I have a problem with Tesserocr. Tried with both SetVariable and config file. Both way not working(See my code).
Is it the right way to ReadConfigFile ! How can I test if config file itself is working with Tesserocr? I have used the config file to switch off the default dictionary and now I doubt whether that was also working and if config file has no impact while using tesseocr(api.ReadConfigFile("letters")
)
I noticed one thing. The below code gives me Tesseract v4.0.0
import tesserocr
print(tesserocr.tesseract_version())
But actually I have uninstalled Tesseract V4.0 and now I have Tesseract v4.1.0
Is tesserocr using some in build Tesseract version ? May be this is the issue why blacklist and whitelist not working. help me with a solution. FYI: I am working in windows.
I noticed one thing. The below code gives me Tesseract v4.0.0
There's your problem, tesserocr is compiled against tesseract v4.0.0. You have to re-install with the proper tesseract version.
I tried uninstalling tesserocr:
pip uninstall tesserocr
DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
_LOGGER.warn('Failed to extract tesseract version number from: {}'.format(version))
Failed to extract tesseract version number from: tesseract v4.1.0-elag2019
leptonica-1.78.0
.
.
.
tesserocr.cpp(634): fatal error C1083: Cannot open include file: 'leptonica/allheaders.h': No such file or directory
On Windows, the recommended installation method is via Conda. I'm not sure if it's already built against tesseract 4.1 (but possibly yes).
Installation done via conda and now the Tesseract version is V4.1.1 and the latest tesserocr got installed and everything works fine. Thank you.
Config file has no impact with Tesserocr:
I am using tesseract 4.1.0 and whitelist(tried with black list too) works for me with pytesseract:
and tried with config file too:
print(pytesseract.image_to_string(img, config='letters'))
My test config file is simple as below:
tessedit_char_blacklist abcd
But it is NOT working with tesserocr:
below code also not working(tried with blacklist too):
any other method to set config file with tesseocr? Seems it is not the issue with whiteList. I believe the issue is something about reading the config file using tesserocr. Please help!!!