ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
http://ocrmypdf.readthedocs.io/
Mozilla Public License 2.0
14.15k stars 1.02k forks source link

Issue packaging with pyinstaller #1024

Open kiyros opened 2 years ago

kiyros commented 2 years ago

below is my fileToOcr function

os.environ["TESSDATA_PREFIX"] = "Location of Tessdata Folder"
if __name__ == '__main__':
        start = timer()
        ocrmypdf.ocr(pdf, output_file=f'{outputPath}\\{pdfOriginalName.strip(".pdf").strip(".PDF")}.PDF',
        redo_ocr=True, output_type="pdf")
        end = timer()

        print(end - start, " Seconds")

        # Remove checked pdf from queue/check Folder
        os.remove(pdf)

This code runs fine inside visual studio but as soon as i package using pyinstaller pyinstaller --onefile --collect-data pikepdf --hidden-import pikepdf --copy-metadata pikepdf --collect-data ocrmypdf --hidden-import ocrmypdf --copy-metadata ocrmypdf -w 'dailyEDI_PDF_checker.py'

it gives me this error when i run the .EXE

  File "ocrmypdf\api.py", line 339, in ocr
  File "ocrmypdf\_validation.py", line 246, in check_options
  File "ocrmypdf\_validation.py", line 240, in _check_plugin_options
AttributeError: 'NoneType' object has no attribute 'languages'

I am not entirely sure why it is giving me 'noneType'

OS: Windows installed using: Pip Expected behavior: To run the ocr Function smoothly

jbarlow83 commented 2 years ago

I believe you need to --hidden-import all of ocrmypdf's built in plugins because PyInstaller doesn't understand the plugin manager.

I did try creating a PyInstaller edition of ocrmypdf before but got worried about the time to maintain another build artifact.

kiyros commented 2 years ago

By ocrmypdf's "Hidden imports" do you mean ghostscript, pngquant etc?

jbarlow83 commented 2 years ago

No, I believe you need to specify --hidden-import ocrmypdf.builtin_plugins. Something along those lines.

I don't know of any way to use PyInstaller to bundle third party dependencies like Tesseract and Ghostscript, which have their own installers and license agreements.

yang-521 commented 1 year ago

I also encountered the same problem use --hidden-import ocrmypdf.builtin_plugins and --copy-metadata Unable to solve the problem, the same error will still occur

mathieugruson commented 11 months ago

For anyone finding this topic through google, the probleme has been solved here (at least for me) : https://github.com/ocrmypdf/OCRmyPDF/issues/659#issuecomment-1517712600