ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
http://ocrmypdf.readthedocs.io/
Mozilla Public License 2.0
14.13k stars 1.02k forks source link

Error when trying to OCR JPEG or PNG #42

Closed shaunc869 closed 8 years ago

shaunc869 commented 8 years ago

When I try to run:

sudo ocrmypdf --verbose 3 eiffel.jpg eiffel.pdf

I get:

Original exception:
Exception #1
  'builtins.TypeError(Can't convert 'list' object to str implicitly)' raised in ...
   Task = def ocrmypdf.main.split_pages(...):
   Job  = [[] -> .../com.github.ocrmypdf.45n_qza7/*.page.pdf, <ocrmypdf.main.WrappedLogger>, [], <_thread.lock>]

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions
    register_cleanup, touch_files_only)
  File "/usr/local/lib/python3.4/dist-packages/ruffus/task.py", line 567, in job_wrapper_io_files
    ret_val = user_defined_work_func(*params)
  File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/main.py", line 415, in split_pages
    npages = qpdf.get_npages(input_file)
  File "/usr/local/lib/python3.4/dist-packages/ocrmypdf/qpdf.py", line 68, in get_npages
    universal_newlines=True, close_fds=True)
  File "/usr/lib/python3.4/subprocess.py", line 607, in check_output
    with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
  File "/usr/lib/python3.4/subprocess.py", line 859, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.4/subprocess.py", line 1395, in _execute_child
    restore_signals, start_new_session, preexec_fn)
TypeError: Can't convert 'list' object to str implicitly

If I try the same thing on a PDF file it works fine. This is for version 3.1.1, thanks!

I can repeat the bug on both Mac OS X El Capitan and Debian 8, I can also repeat the error in version 3.1 and 3.0.

The file in question is here (yes I know there isn't any text I was just using it for testing):

eiffel

jbarlow83 commented 8 years ago

Only PDF input is supported for now.

img2pdf does a good job at converting most JPEGs and PNGs to PDF. And yes, that's a really crappy error message, which I should fix.

On Tue, 12 Jan 2016 at 15:06 Shaun notifications@github.com wrote:

When I try to run:

sudo ocrmypdf --verbose 3 eiffeljpg eiffelpdf

I get:

Original exception: Exception #1 'builtinsTypeError(Can't convert 'list' object to str implicitly)' raised in Task = def ocrmypdfmainsplit_pages(): Job = [[] -> /comgithubocrmypdf45n_qza7/*pagepdf, , [], <_threadlock>]

Traceback (most recent call last): File "/usr/local/lib/python34/dist-packages/ruffus/taskpy", line 751, in run_pooled_job_without_exceptions register_cleanup, touch_files_only) File "/usr/local/lib/python34/dist-packages/ruffus/taskpy", line 567, in job_wrapper_io_files ret_val = user_defined_work_func(_params) File "/usr/local/lib/python34/dist-packages/ocrmypdf/mainpy", line 415, in split_pages npages = qpdfget_npages(input_file) File "/usr/local/lib/python34/dist-packages/ocrmypdf/qpdfpy", line 68, in get_npages universal_newlines=True, close_fds=True) File "/usr/lib/python34/subprocesspy", line 607, in check_output with Popen(_popenargs, stdout=PIPE, kwargs) as process: File "/usr/lib/python34/subprocesspy", line 859, in init** restore_signals, start_new_session) File "/usr/lib/python34/subprocesspy", line 1395, in _execute_child restore_signals, start_new_session, preexec_fn) TypeError: Can't convert 'list' object to str implicitly

If I try the same thing on a PDF file it works fine Thanks!

— Reply to this email directly or view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/42.

jbarlow83 commented 8 years ago

Next release fixes the error message.

By the way, there should be no need to sudo ocrmypdf. You don't have to trust me with root access to your system.