madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.77k stars 714 forks source link

Tesserract.exe stopped working when trying to run OCR #488

Closed Futi7 closed 1 year ago

Futi7 commented 1 year ago

I have a Python project that uses pytesserract to apply OCR to an image and get the text from it. I have compiled this project with pyinstaller and the project works fine in my local, in a windows sandbox environment and in a windows 2012 server machine but when I deployed it to production server with same OS(windows server 2012 R2) I got an error during OCR process. First a window popped out saying tesserract.exe has stopped working. Then I checked the logs and I found 1 Information log following 2 error logs related to this issue.

-Information:

Fault bucket , type 0 Event Name: APPCRASH Response: Not available Cab Id: 0

Problem signature: P1: tesseract.exe P2: 0.0.0.0 P3: 639a0c83 P4: libtesseract-5.dll P5: 0.0.0.0 P6: 639a0c7f P7: c000001d P8: 001b639f P9: P10:

Attached files:

These files may be available here: C:\Users\usr\AppData\Local\Microsoft\Windows\WER\ReportArchive\AppCrash_tesseract.exe_45e0684ae65db46f6f2bc3f433eb0f313d8116_60142e40_19e538bc

Analysis symbol: Rechecking for solution: 0 Report Id: 2928099

2-e2a6-11ed-812b-005056a87ce5 Report Status: 2048 Hashed bucket:

-Error 1:

Faulting application name: tesseract.exe, version: 0.0.0.0, time stamp: 0x639a0c83

Faulting module name: libtesseract-5.dll, version: 0.0.0.0, time stamp: 0x639a0c7f

Exception code: 0xc000001d

Fault offset: 0x001b639f

Faulting process id: 0x17f8

Faulting application start time: 0x01d976b3f22cf105

Faulting application path: C:\Program Files (x86)\Tesseract-OCR\tesseract.exe

Faulting module path: C:\Program Files (x86)\Tesseract-OCR\libtesseract-5.dll

Report Id: 3012b6a3-e2a7-11ed-812b-005056a87ce5

Faulting package full name:

Faulting package-relative application ID:

Error 2

Windows cannot access the file for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program tesseract.exe because of this error.

Program: tesseract.exe File:

The error value is listed in the Additional Data section. User Action

Open the file again. This situation might be a temporary problem that corrects itself when the program runs again. If the file still cannot be accessed and - It is on the network, your network administrator should verify that there is not a problem with the network and that the server can be contacted. - It is on a removable disk, for example, a floppy disk or CD-ROM, verify that the disk is fully inserted into the computer. Check and repair the file system by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and then click OK. At the command prompt, type CHKDSK /F, and then press ENTER. If the problem persists, restore the file from a backup copy. Determine whether other files on the same disk can be opened. If not, the disk might be damaged. If it is a hard disk, contact your administrator or computer hardware vendor for further assistance.

Additional Data Error value: 00000000 Disk type: 0

There is no faulty-disk related issue. I suspected that it might be related to permissions so I ran my OCR project as admin but still same issue.

There is no internet connection in this environment but I tried the project on an environment that has no internet connection to make sure the packages doesn't require a connection.

I have set tesseract.exe to require run by Admin then tried my command as admin still same issue.

I have checked the possible reasons my compiled python script exe is 64-bit and its using 32-bit tesseract I suspected if this might be the issue but my tests were done like this before successfully.

I have also checked antivirus or firewall logs to make sure none of them blocked the dll but there were no logs regarding this.

I deployed an update and changed the directiory of Tesserract-OCR and tried again it still fails, I tried to run "tesseract images/eurotext.png - -l eng" in the D://Tesserract-OCR/ and it actually worked but when i run the app i assume following line fails "pytesseract.image_to_string(thresh, lang=self.lang_tesseract)"

stefan6419846 commented 1 year ago

From the above logs I suspect that this is no real pytesseract issue, so we cannot do much about this here. pytesseract basically calls the Tesseract binary in a subprocess and parses the corresponding output. Your failure is in the corresponding Tesseract library files.

I suspect that the issue rather is with Tesseract itself (it might be related to the pyinstaller transformations as well, but the error message appears to indicate otherwise).

Have you tried running the faulty image with plain Tesseract?

Futi7 commented 1 year ago

Yes I tried with plain tesserract.exe and it works fine. I have checked all the possibilities even disabling the antivirus programs but still getting the same issue