ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
http://ocrmypdf.readthedocs.io/
Mozilla Public License 2.0
14.13k stars 1.02k forks source link

dependecy problem reportlab - allthough installed... #13

Closed OCRmyPDF-issuebot closed 8 years ago

OCRmyPDF-issuebot commented 9 years ago

Issue by andreasotto Tue Nov 4 10:44:25 2014 Originally opened as https://github.com/fritz-hh/OCRmyPDF/issues/99


# ./OCRmyPDF.sh /home/ao/Leerungstermine189973.PDF /home/ao/test.pdf
Please install the python library reportlab. Exiting...

# apt-get install python-reportlab
python-reportlab ist schon die neueste Version.

.. already installed.

Debian 6 squeeze

OCRmyPDF-issuebot commented 9 years ago

Comment by andreasotto Tue Nov 4 10:51:46 2014


# python -c 'import reportlab' && echo "installed"
installed
OCRmyPDF-issuebot commented 9 years ago

Comment by andreasotto Tue Nov 4 10:54:20 2014


Ah, i've seen OCRmyPDF wants to have reportlab version >= 3.0 Under Debian 6 squeeze the version is: 2.4-4 Are there real dependencies for >= 3.0?

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 12:15:31 2015


OCRmyPDF v3.0-rc2 needs reportlab >= 3.0 (although there is a workaround to avoid reportlab: --pdf-renderer tesseract if you have Tesseract 3.03). In both v2.0 and v3.0 of OCRmyPDF, it's a 'firm' dependency because older reportlabs had a serious bug in image handling that really bloated the sizes of PDFs.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 17:47:13 2015


Actual problem with

Having the problem and reading this issue I checked the Tumbleweed reportlab version (2.7-3.3) and let me post the

suggestion

to change the OCRmyPDF dependency-checker test to "Please install the python library reportlab version >= 3.0". i.e. to notify the user they must install a correct version.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 17:50:34 2015


@jbarlow83 pls. can you explain, where the option --pdf-renderer tesseract is to be added ? It does not work on a command line of OCRmyPDF, and it is not mentioned in the help, where such an option could be added.

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 19:02:34 2015


Only the new version (a pre-release) supports it: https://github.com/fritz-hh/OCRmyPDF/releases version v3.0-rc2.

Or download the source latest from the "master" branch.

On Tue, 28 Jul 2015 at 10:50 Wikinaut notifications@github.com wrote:

@jbarlow83 https://github.com/jbarlow83 pls. can you explain, where the option --pdf-renderer tesseract is to be added ? It does not work on a command line of OCRmyPDF

— Reply to this email directly or view it on GitHub https://github.com/fritz-hh/OCRmyPDF/issues/99#issuecomment-125692663.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:27:40 2015


hmm, one problem is solved (when checking out here, the default branch is v2.x. i changed this now to master) ...

but now I get

# sh ./OCRmyPDF.sh -h
Traceback (most recent call last):
  File "/usr/lib64/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/src/OCRmyPDF/ocrmypdf/main.py", line 16, in <module>
    import PyPDF2 as pypdf
ImportError: No module named 'PyPDF2'

When I run

./OCRmyPDF.sh -h
bash: ./OCRmyPDF.sh: Keine Berechtigung

All files and subdirectories belong to the current user.

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 19:33:49 2015


It's a Python 3 package now. Run the installer in the current directory: pip3 install -e .

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:34:53 2015


oops, while you wrote your answer, I read the readme and did the pip3, but:

  FileNotFoundError: [Errno 2] No such file or directory: 'mutool'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /usr/local/src/OCRmyPDF

So I have to install this, too. (should it be added to the dependency checks ???)

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:35:51 2015


uh, `mutool`` is not in Tumbleweed. Have to look for it. (Even Tesseract is easier to install)

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 19:36:44 2015


It's mupdf-tools. If it's a pain to get, how is qpdf? Both do the same thing.

You'll need tesseract, ghostscript, unpaper, poppler, and java too.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:38:16 2015


It is part of mupdf in Opensuse

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:39:05 2015


Now this is gone, but I have

pip3 install -e .
Obtaining file:///usr/local/src/OCRmyPDF
    Complete output from command python setup.py egg_info:
    Checking for tesseract >= 3.02.02...
    Found tesseract 3.04
    Checking for gs >= 9.14...
    Found gs 9.16
    Checking for unpaper >= 6.1...
    Found unpaper 6.2
    Checking for pdfseparate >= 0.29.0...
    Found pdfseparate 0.33.0
    Checking for java >= 1.5.0...
    Found java 1.8.0
    Checking for mutool >= 1.7a...
    Traceback (most recent call last):
      File "/usr/local/src/OCRmyPDF/setup.py", line 117, in check_external_program
        version = version_scrape_regex.search(result).group(1)
    AttributeError: 'NoneType' object has no attribute 'group'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/usr/local/src/OCRmyPDF/setup.py", line 167, in <module>
        package='mupdf-tools'
      File "/usr/local/src/OCRmyPDF/setup.py", line 119, in check_external_program
        error_unknown_version(program, package, optional, minimum_version)
    TypeError: error_unknown_version() takes 3 positional arguments but 4 were given

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /usr/local/src/OCRmyPDF

(updated with the complete output)

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:41:40 2015


(post above updated with the complete output)

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 19:41:55 2015


Added a possible fix - do a git pull.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:44:59 2015


I was (I am) already on 6e6f918630bba7077ba9a50d75a138767422bce7 . This gives the above error.

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 19:48:49 2015


Apologies, I pushed it to wrong repo. commit 6901550 should now be available on the main repo.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 19:51:23 2015


Different error output:

pip3 install -e .
Obtaining file:///usr/local/src/OCRmyPDF
    Complete output from command python setup.py egg_info:
    Checking for tesseract >= 3.02.02...
    Found tesseract 3.04
    Checking for gs >= 9.14...
    Found gs 9.16
    Checking for unpaper >= 6.1...
    Found unpaper 6.2
    Checking for pdfseparate >= 0.29.0...
    Found pdfseparate 0.33.0
    Checking for java >= 1.5.0...
    Found java 1.8.0
    Checking for mutool >= 1.7a...
    Traceback (most recent call last):
      File "/usr/local/src/OCRmyPDF/setup.py", line 117, in check_external_program
        version = version_scrape_regex.search(result).group(1)
    AttributeError: 'NoneType' object has no attribute 'group'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/usr/local/src/OCRmyPDF/setup.py", line 167, in <module>
        package='mupdf-tools'
      File "/usr/local/src/OCRmyPDF/setup.py", line 119, in check_external_program
        error_unknown_version(program, package, optional)
      File "/usr/local/src/OCRmyPDF/setup.py", line 83, in error_unknown_version
        print(unknown_version.format(**locals()), file=sys.stderr)
    KeyError: 'need_version'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /usr/local/src/OCRmyPDF
OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 20:08:52 2015


Thanks for your patience. Please pull again and it give another shot.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 20:29:31 2015


uh, now I have

pip3 install -e .
Obtaining file:///usr/local/src/OCRmyPDF
    Complete output from command python setup.py egg_info:
    Checking for tesseract >= 3.02.02...
    Found tesseract 3.04
    Checking for gs >= 9.14...
    Found gs 9.16
    Checking for unpaper >= 6.1...
    Found unpaper 6.2
    Checking for pdfseparate >= 0.29.0...
    Found pdfseparate 0.33.0
    Checking for java >= 1.5.0...
    Found java 1.8.0
    Checking for mutool >= 1.7a...

    OCRmyPDF requires 'mutool' 1.7a or higher.  Your system has
    'mutool' but we cannot tell what version is installed.  Contact the
    package maintainer.

    This program is REQUIRED for OCRmyPDF to work.  Installation will abort.

    On systems with the aptitude package manager (Debian, Ubuntu), try these
    commands:
        sudo apt-get update
        sudo apt-get install mupdf-tools

    On RPM-based systems (Red Hat, Fedora), search for instructions on
    installing the RPM for mupdf-tools.

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /usr/local/src/OCRmyPDF
OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 20:30:18 2015


mupdf 1.7-1.3 on Opensuse Tumbleweed

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 20:37:56 2015


I don't know what to think when Linux distributions make up arbitrary version numbers that don't follow the package's own conventions, as in this case.

I dropped the version requirement to mupdf 1.7.

OCRmyPDF-issuebot commented 9 years ago

Comment by Wikinaut Tue Jul 28 20:47:37 2015


better, but still buggy:

   writing manifest file 'ruffus.egg-info/SOURCES.txt'
    running install_lib
    creating /usr/lib/python3.4/site-packages/ruffus
    error: could not create '/usr/lib/python3.4/site-packages/ruffus': Permission denied

    ----------------------------------------
Command "/usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip-build-1ekvbx92/ruffus/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-8nnv8idr-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-1ekvbx92/ruffus
OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Tue Jul 28 23:24:17 2015


You need to install as sudo or create a virtual environment with pyvenv and install to that environment. As the error message says it doesn't have permission to write /usr/local. On Tue, Jul 28, 2015 at 13:47 Wikinaut notifications@github.com wrote:

better, but still buggy:

writing manifest file 'ruffus.egg-info/SOURCES.txt' running install_lib creating /usr/lib/python3.4/site-packages/ruffus error: could not create '/usr/lib/python3.4/site-packages/ruffus': Permission denied

----------------------------------------

Command "/usr/bin/python3 -c "import setuptools, tokenize;file='/tmp/pip-build-1ekvbx92/ruffus/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-8nnv8idr-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-1ekvbx92/ruffus

— Reply to this email directly or view it on GitHub https://github.com/fritz-hh/OCRmyPDF/issues/99#issuecomment-125750003.

OCRmyPDF-issuebot commented 9 years ago

Comment by jbarlow83 Wed Jul 29 00:49:10 2015


With a virtual environment:

pyvenv venv
source venv/bin/activate
pip3 install -e .