mathewthe2 / Game2Text

Complete toolbox for gamifying language learning
https://www.Game2Text.com
Apache License 2.0
194 stars 26 forks source link

Tesseract Using Environment Variable's Path In Windows Instead Of Bundled Path #29

Open ryuga93 opened 3 years ago

ryuga93 commented 3 years ago

Hi, in the latest version, the Tesseract engine will use the Path set in Environment Variable instead of the path from the bundle, causing it to throw error (or OCR not working in the release version).

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Users\\PC1\\Downloads\\Tesseract-OCR\\jpn.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'jpn\' Error opening data file C:\\Users\\PC1\\Downloads\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not set option: use_new_state_cost=F Could not set option: segment_segcost_rating=F Could not set option: enable_new_segsearch=0 Could not initialize tesseract.') 2021-07-18T20:55:47Z <Greenlet at 0x1b188676048: _process_message({'call': 14.888897478777801, 'name': 'recognize_im, <geventwebsocket.websocket.WebSocket object at 0x0)> failed with TesseractError

The possible bug is in https://github.com/mathewthe2/Game2Text/blob/50cf52cfdbc911daa52a71bd136b919e32a9e718/tools.py#L54

where the Windows branch does not return a proper tessdata-dir path value. Adding a return seems to fix this problem for me.

mathewthe2 commented 3 years ago

Which Tesseract version are you referring to?

And what do you mean by proper tessdata-dir path? Did you export the TESSDATA_PREFIX environment variable manually?

ryuga93 commented 3 years ago

I have 5.0 installed in my machine, so in my environment variable path setting, it is set to my installation folder for my own project use. By proper tessdata-dir path I mean the bundled path, ie the Tesseract bundled together with the executable.

In the Darwin branch there is a return statement for it, https://github.com/mathewthe2/Game2Text/blob/50cf52cfdbc911daa52a71bd136b919e32a9e718/tools.py#L44

so I figured that Windows needs it's own return statement too, and added return '--tessdata-dir {}'.format('%r'%str(Path(WIN_TESSERACT_DIR, "tessdata")))

after the line https://github.com/mathewthe2/Game2Text/blob/50cf52cfdbc911daa52a71bd136b919e32a9e718/tools.py#L53

in which my compilation worked.