ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
http://ocrmypdf.readthedocs.io/
Mozilla Public License 2.0
14.13k stars 1.02k forks source link

liblept-5.dll load fails on Windows 10 (OSError 0x7F) #631

Closed suyashb95 closed 4 years ago

suyashb95 commented 4 years ago

Describe the bug Running ocrmypdf throws an error saying

The procedure entry point inflateValidate could not be located in the dynamic link library libpng16-16.dll

When I close that dialogue, I see the following trace in the command prompt

Traceback (most recent call last):
  File "...\ocrmypdf-script.py", line 11, in <module>
    load_entry_point('ocrmypdf==11.0.2', 'console_scripts', 'ocrmypdf')()
  File "..\Python37\lib\site-packages\pkg_resources\__init__.py", line 490, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File ..\Python37\lib\site-packages\pkg_resources\__init__.py", line 2862, in load_entry_point
    return ep.load()
  File "..\Python37\lib\site-packages\pkg_resources\__init__.py", line 2462, in load
    return self.resolve()
  File "..\Python37\lib\site-packages\pkg_resources\__init__.py", line 2468, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "..\Python37\lib\site-packages\ocrmypdf-11.0.2-py3.7.egg\ocrmypdf\__init__.py", line 10, in <module>
    from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
  File "..\Python37\lib\site-packages\ocrmypdf-11.0.2-py3.7.egg\ocrmypdf\leptonica.py", line 62, in <module>
    lept = ffi.dlopen(_libpath)
OSError: cannot load library 'C:\Program Files\Tesseract-OCR\liblept-5.dll': error 0x7f

System (please complete the following information):

Additional context Followed the setup instructions to install Tesseract, Ghostscript and pngquant from the docs. I tried running python setup.py install from the source as well but that doesn't fix it I see people have faced similar issues in https://github.com/jbarlow83/OCRmyPDF/issues/455 but I couldn't find a solution there.

jbarlow83 commented 4 years ago

A previous instance of this error came from a user trying to use 32-bit Python with a 64-bit version of Tesseract. Or vice versa. You might have 32-bit Python.

suyashb95 commented 4 years ago

My bad, should've mentioned that in the issue. I'm using 64 bit Python, if I use 32 bit Tesseract with it I get a different error (OSError 0xC1)

suyashb95 commented 4 years ago

Found a similar issue here. Looks like libpng uses the inflateValidate() method from zlib1.dll if it's built with a more recent version of zlib. Loading zlib1.dll before loading liblept-5.dll in leptonica.py fixed it. We have to make sure the zlib version being loaded is the same one that libpng was built with otherwise it might throw the same error again (As per this StackOverflow answer)

Snippet modified in leptonica.py

try:
    _libpath_zlib = find_library('zlib1')
    zlib = ffi.dlopen(_libpath_zlib)
    lept = ffi.dlopen(_libpath)
    lept.setMsgSeverity(lept.L_SEVERITY_WARNING)
except ffi.error as e:
    raise MissingDependencyError(
        f"Leptonica library found at {_libpath}, but we could not access it"
    ) from e

Guess this is going to be an issue with any recent version of Tesseract. Let me know if I should submit a PR for this?

jbarlow83 commented 4 years ago

Yes please, a PR would be great. We should only attempt to load zlib like this if os.name == 'nt', i.e. if Windows.

jaan143 commented 4 years ago

@Suyash458 i am getting this error also can you explain me little how can i solve it from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "C:\Users\vicky\AppData\Local\Programs\Python\Python36-32\lib\site-packages\ocrmypdf\leptonica.py", line 71, in zlib = ffi.dlopen(_zlib_path) OSError: cannot load library 'C:\Users\vicky\AppData\Local\Tesseract-OCR\zlib1.dll': error 0xc1 PS C:\Users\vicky>

suyashb95 commented 4 years ago

@jaan143 maybe check if zlib1.dll is present in the locationC:\Users\vicky\AppData\Local\Tesseract-OCR ? If it's present then it's possible the program doesn't have access to it and needs to be run as administrator

jaan143 commented 4 years ago

@Suyash458 dear zlib1.dll is present in C:\Users\vicky\AppData\Local\Tesseract-OCR i run as administrator but still getting same error

suyashb95 commented 4 years ago

@jaan143 maybe you're using 32 bit python with 64 bit tesseract or vice-versa?

h4rvey-g commented 3 years ago

I'm receiving this error too. I'm sure Tesseract and python are both 64 bit and C:\Program Files\Tesseract-OCR\zlib1.dll exists. `Traceback (most recent call last):

File "c:\users\babao\appdata\local\programs\python\python37-32\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\users\babao\appdata\local\programs\python\python37-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\babao\AppData\Local\Programs\Python\Python37-32\Scripts\ocrmypdf.exe__main.py", line 4, in File "c:\users\babao\appdata\local\programs\python\python37-32\lib\site-packages\ocrmypdf\init__.py", line 10, in from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "c:\users\babao\appdata\local\programs\python\python37-32\lib\site-packages\ocrmypdf\leptonica.py", line 72, in zlib = ffi.dlopen(_zlib_path) OSError: cannot load library 'C:\Program Files\Tesseract-OCR\zlib1.dll': error 0xc1` Any help is appreciated.

jbarlow83 commented 3 years ago

The path "Python37-32" in the stack trace indicates that 32-bit Python is running.

h4rvey-g commented 3 years ago

The path "Python37-32" in the stack trace indicates that 32-bit Python is running.

Thanks! I forgot to uninstall the 32-bit python previously installed via windows store. Now it's solved.

jbarlow83 commented 3 years ago

Great to hear.

For anyone who finds this thread, please download Python from python.org and avoid the Windows Store version, which causes many issues with other packages as well.

Ynjxsjmh commented 1 year ago

I use the tesseract-ocr-w64-setup-5.3.1.20230401.exe (64 bit) installer, and it doesn't contain liblept-5.dll. Instead, it contains libleptonica-6.dll. I need replace libname = 'liblept-5' to libname = 'libleptonica-6' in leptonica.py.

jbarlow83 commented 1 year ago

Ocrmypdf has removed its leptonica module because of the difficulty of keeping ABI linkage stable (and not wanting to give ocrmypdf a compiled module).