ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
http://ocrmypdf.readthedocs.io/
Mozilla Public License 2.0
14.13k stars 1.02k forks source link

Fails when stdout not connected #142

Closed spwhitton closed 7 years ago

spwhitton commented 7 years ago

OCRmyPDF fails when stdout is not connected. Is this necessary?

Background: I am calling ocrmypdf from a program which attachs a logfile handle to stderr, and nothing to stdout. I can work around this issue by telling OCRmyPDF to write its output to stdout, and re-arranging my program accordingly.

 Traceback (most recent call last):
  File "/usr/bin/ocrmypdf", line 11, in <module>
    load_entry_point('ocrmypdf==4.3.5', 'console_scripts', 'ocrmypdf')()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 561, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2631, in load_entry_point
    return ep.load()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2291, in load
    return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2297, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python3/dist-packages/ocrmypdf/__main__.py", line 387, in <module>
    logging_factory, __name__, [None, options.verbose])
  File "/usr/lib/python3/dist-packages/ruffus/proxy_logger.py", line 342, in make_shared_logger_and_pro
xy                                                                                                         manager.start()
  File "/usr/lib/python3.5/multiprocessing/managers.py", line 479, in start
    self._process.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 17, in __init__
    sys.stdout.flush()
AttributeError: 'NoneType' object has no attribute 'flush'
jbarlow83 commented 7 years ago

I think this may be related to https://bugs.python.org/issue28326 that is, stream objects lacking a .flush() do not play nicely with multiprocessing and I may not be able to do much about that.

Does it work to attach an open /dev/null to stdout?

spwhitton commented 7 years ago

On Sun, Mar 12, 2017 at 05:27:09PM -0700, jbarlow83 wrote:

Does it work to attach an open /dev/null to stdout?

Yes, it does.

-- Sean Whitton

jbarlow83 commented 7 years ago

Workaround added in commit f035cb1