Closed beldougie closed 9 years ago
You can get around this by running it with TESSDATA_PREFIX=./ python EliteOCR.py
. I haven't been able to figure out how to fix it in code yet unfortunately. But this does get the OCR working and it's smooth sailing from there.
Normally tesseract should look for big.traineddata in the path of EliteOCR. It might be that the mac version is hardcoded just to the preset locations. It would be usefull to find out where it's looking and why it ignores the preset of the app(maybe there is some mac specific difference where just a small correction could solve this)
I was able to track this down to the fact that Windows automatically includes ./
in its search paths, while OSX/Linux don't. That's why manually passing TESSDATA_PREFIX=./
worked. So I just added it to the startup process. It'll continue to work on Windows as before, but now it'll also work on OSX. (and possibly Linux as well, though that's still to be tested)
that doesn't sound right yet. In ocrmethods.py there are those lines: api.Init(self.path.encode('windows-1252'), "big", tesseract.OEM_DEFAULT)
This is the setup of tesseract. The first argument is a string with the path to where "tessdata/big.traineddata" is. The path comes from settings.py (method: getPathToSelf).
If i was you I would test which string is available on every step of the chain (ocrmethods.py -> ocr.py -> settings.py). The problem might be as simple as problem with de/encoding or finding proper path by the mentioned method.
P.S. I really recommend not to set any system variables if you can avoid it.
I did some more research on this, and it's a combination of two factors:
TESSDATA_PREFIX
if it's available regardless of any other paths that are passed to the moduleTESSDATA_PREFIX
to the parent directory of the module at build time, meaning the only way to change the path at runtime is to manually set TESSDATA_PREFIX
.So even though the API call in ocrmethods.py
includes:
api.Init(self.path.encode('windows-1252'), "big", tesseract.OEM_DEFAULT)
If TESSDATA_PREFIX
has been set anywhere, the first part of the argument will be ignored. This is why the only way to fix this currently is to manually set TESSDATA_PREFIX
in EliteOCR. The good news is that this does not break the Windows compatibility in any way.
Once python-tesseract gets updated to tesseract 3.03, it should start respecting the path that's passed in at init time so at that time the change can be reverted if necessary.
Ok, great to hear. End of this week I will try to include all your fixes into EliteOCR master branch.
Hi @seeebek, do you have a file on your system within
<tesseract>/share/tessdata
namedbig.traineddata
? I am getting errors that it doesn't exist (which it doesn't, I only have english and osd). Just wondering if it was created by windows or I need to obtain it from somewhere?Cheers