kanzure / pdfparanoia

pdf watermark removal library for academic papers
https://pypi.python.org/pypi/pdfparanoia
533 stars 52 forks source link

Dependency hell, pdfminer. #37

Closed fmap closed 10 years ago

fmap commented 10 years ago

Installation woes:

% pip --version
pip 1.4.1 from /usr/lib/python3.3/site-packages (python 3.3)
% sudo pip install pdfparanoia
[sudo] password for vi: 
Downloading/unpacking pdfparanoia
  Downloading pdfparanoia-0.0.15.tar.gz
  Running setup.py egg_info for package pdfparanoia
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/tmp/pip_build_root/pdfparanoia/setup.py", line 6, in <module>
        import pdfparanoia
      File "./pdfparanoia/__init__.py", line 27, in <module>
        from .core import scrub
      File "./pdfparanoia/core.py", line 13, in <module>
        from .parser import (
      File "./pdfparanoia/parser.py", line 18, in <module>
        import pdfminer.pdfparser
    ImportError: No module named 'pdfminer'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/tmp/pip_build_root/pdfparanoia/setup.py", line 6, in <module>

    import pdfparanoia

  File "./pdfparanoia/__init__.py", line 27, in <module>

    from .core import scrub

  File "./pdfparanoia/core.py", line 13, in <module>

    from .parser import (

  File "./pdfparanoia/parser.py", line 18, in <module>

    import pdfminer.pdfparser

ImportError: No module named 'pdfminer'

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/pdfparanoia
Storing complete log in /root/.pip/pip.log

Which indicates pdfminer is missing, so:

% sudo pip install pdfminer3k
Downloading/unpacking pdfminer3k
  Downloading pdfminer3k-1.3.0.tar.gz (9.7MB): 9.7MB downloaded
  Running setup.py egg_info for package pdfminer3k

    Not SVN Repository
Requirement already satisfied (use --upgrade to upgrade): pytest>=2.0 in /usr/lib/python3.3/site-packages (from pdfminer3k)
Requirement already satisfied (use --upgrade to upgrade): ply>=3.4 in /usr/lib/python3.3/site-packages (from pdfminer3k)
Requirement already satisfied (use --upgrade to upgrade): py>=1.4.17 in /usr/lib/python3.3/site-packages (from pytest>=2.0->pdfminer3k)
Installing collected packages: pdfminer3k
  Running setup.py install for pdfminer3k
    changing mode of build/scripts-3.3/pdf2txt.py from 600 to 755
    changing mode of build/scripts-3.3/dumppdf.py from 600 to 755
    changing mode of build/scripts-3.3/latin2ascii.py from 600 to 755

    Not SVN Repository
    changing mode of /usr/bin/pdf2txt.py to 755
    changing mode of /usr/bin/dumppdf.py to 755
    changing mode of /usr/bin/latin2ascii.py to 755
Successfully installed pdfminer3k
Cleaning up...
% sudo pip install pdfparanoia
Downloading/unpacking pdfparanoia
  Downloading pdfparanoia-0.0.15.tar.gz
  Running setup.py egg_info for package pdfparanoia

    Not SVN Repository
Requirement already satisfied (use --upgrade to upgrade): pdfminer3k>=1.3.0 in /usr/lib/python3.3/site-packages (from pdfparanoia)
Requirement already satisfied (use --upgrade to upgrade): pytest>=2.0 in /usr/lib/python3.3/site-packages (from pdfminer3k>=1.3.0->pdfparanoia)
Requirement already satisfied (use --upgrade to upgrade): ply>=3.4 in /usr/lib/python3.3/site-packages (from pdfminer3k>=1.3.0->pdfparanoia)
Requirement already satisfied (use --upgrade to upgrade): py>=1.4.17 in /usr/lib/python3.3/site-packages (from pytest>=2.0->pdfminer3k>=1.3.0->pdfparanoia)
Installing collected packages: pdfparanoia
  Running setup.py install for pdfparanoia
    changing mode of build/scripts-3.3/pdfparanoia from 600 to 755

    Not SVN Repository
    changing mode of /usr/bin/pdfparanoia to 755
Successfully installed pdfparanoia
Cleaning up...
% 

Fantastic, but now:

% python
Python 3.3.2 (default, Sep  6 2013, 09:30:10) 
[GCC 4.8.1 20130725 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdfparanoia
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./pdfparanoia/__init__.py", line 27, in <module>
    from .core import scrub
  File "./pdfparanoia/core.py", line 13, in <module>
    from .parser import (
  File "./pdfparanoia/parser.py", line 18, in <module>
    import pdfminer.pdfparser
ImportError: No module named 'pdfminer.pdfparser'
>>> 

Which is the latest version of pdfminer having the interface this library uses?

kanzure commented 10 years ago

I don't think I've tested pdfminer with python3. I think pdfminer might not work with py3k, so we'd have to write our own pdf parser/generator.

fmap commented 10 years ago

Thanks! It's working a little better with python2, but it looks like pdfminer changed their interface recently, breaking all sorts. You're about to receive a pull request, including the commits above.

kanzure commented 10 years ago

Ah, interesting. I think one possible solution would be to set the pdfminer dependency to an exact version. I am happy to hear that pdfminer is receiving updates.