Open Schabernack opened 10 years ago
I meet same problem:
Python 2.7.5 (default, Mar 9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import PDFDocument
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named PDFDocument
>>> import slate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.macosx-10.9-intel/egg/slate/__init__.py", line 48, in <module>
File "build/bdist.macosx-10.9-intel/egg/slate/slate.py", line 3, in <module>
ImportError: cannot import name PDFDocument
Hrm... ok, I'll look into this. PDFMiner
has changed its API, so we'll need to do some version checking.
I was able to sudo pip install --upgrade --ignore-installed slate==0.3 pdfminer==20110515
, which are compatible versions.
Also affects opensyllabus
Thanks for the workaround.
Okay, I've added a try
except
block, which should allow people to use whichever version of pdfminer they want.
This hasn't been uploaded to PyPI yet.
Ideally, Slate would grab the correct version of PDFMiner when the user installs Slate from PyPI. See http://www.scotttorborg.com/python-packaging/dependencies.html.
Thanks @morninj. Specifying an exact version in setup.py
seems like a good idea.
Closing as I've actually decided that slate
should try to support as much functionality as possible, rather than requiring a specific version. It's aim is to simplify things for the end user and not add bureaucracy. That makes slate
more complex, but it's quite a small package so no bit deal really.
Please reopen if this is ImportError
is still happening.
Received an email about this issue. Perhaps we should force a specific version?
This issue still exists until today. I used the workaround and it is working but the question is with this work around we will never be able to use newer versions of pdfminer with slate. Slate is an awesome > work. So it would be great to let it keep up with the newer versions of pdfminer.
What do you think?
Will reopen issue for a few months and will wait for feedback.
The issue still persists, even after upgrading the slate and pdfminer. Any updates ?
+1
@rajat4493 @imichaeldotorg Could you two try pip install -U slate
now and see if there has been any improvement in the recent release I pushed to PyPI today.
If not, could you please report the version number of your PDFMiner package using the following:
>>> import pdfminer
>>> pdfminer.__version__
'20140328'
Still does not work for me.
I created a fresh Ubuntu 14.04 environment, upgraded PIP, and ran pip install -U slate
.
root@955e8d0c41fe:/# pip install -U slate
Downloading/unpacking slate
Downloading slate-0.3.zip
Running setup.py (path:/tmp/pip_build_root/slate/setup.py) egg_info for package slate
Downloading/unpacking distribute (from slate)
Downloading distribute-0.7.3.zip (145kB): 145kB downloaded
Running setup.py (path:/tmp/pip_build_root/distribute/setup.py) egg_info for package distribute
Downloading/unpacking setuptools>=0.7 from https://pypi.python.org/packages/3.5/s/setuptools/setuptools-20.1-py2.py3-none-any.whl#md5=3802cc2b2cfd7bad320f5e4368dfa341 (from distribute->slate)
Downloading setuptools-20.1-py2.py3-none-any.whl (472kB): 472kB downloaded
Installing collected packages: slate, setuptools, distribute
Running setup.py install for slate
Found existing installation: setuptools 3.3
Not uninstalling setuptools at /usr/lib/python2.7/dist-packages, owned by OS
Running setup.py install for distribute
Successfully installed slate setuptools distribute
Cleaning up...
root@955e8d0c41fe:/# ipython
bash: ipython: command not found
root@955e8d0c41fe:/# python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import slate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/slate/__init__.py", line 48, in <module>
from slate import PDF
File "/usr/local/lib/python2.7/dist-packages/slate/slate.py", line 3, in <module>
from pdfminer.pdfparser import PDFParser, PDFDocument
ImportError: No module named pdfminer.pdfparser
pdfminer wasn't installed as part of the pip process, so I installed it.
root@955e8d0c41fe:/# pip install pdfminer
Downloading/unpacking pdfminer
Downloading pdfminer-20140328.tar.gz (4.1MB): 4.1MB downloaded
Running setup.py (path:/tmp/pip_build_root/pdfminer/setup.py) egg_info for package pdfminer
Installing collected packages: pdfminer
Running setup.py install for pdfminer
changing mode of build/scripts-2.7/pdf2txt.py from 644 to 755
changing mode of build/scripts-2.7/dumppdf.py from 644 to 755
changing mode of build/scripts-2.7/latin2ascii.py from 644 to 755
changing mode of /usr/local/bin/latin2ascii.py to 755
changing mode of /usr/local/bin/pdf2txt.py to 755
changing mode of /usr/local/bin/dumppdf.py to 755
Could not find .egg-info directory in install record for pdfminer
Successfully installed pdfminer
Cleaning up...
After installing pdfminer, I tried importing slate again:
root@955e8d0c41fe:/# python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import slate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/slate/__init__.py", line 48, in <module>
from slate import PDF
File "/usr/local/lib/python2.7/dist-packages/slate/slate.py", line 3, in <module>
from pdfminer.pdfparser import PDFParser, PDFDocument
ImportError: cannot import name PDFDocument
I also verified pdfminer is the version you listed above:
root@955e8d0c41fe:/# python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdfminer
>>> pdfminer.__version__
'20140328'
Commenting to say I am also having the ImportError with a completely fresh pip, Python 2.6, PDFMiner, and slate install. Are there any current known workarounds, such as the one previously posted?
same here
Tried the workaround above but it's not working for me either on python 2.7.11 (OSX 10.11. El Capitan)
Python 2.7.11 (default, Dec 5 2015, 14:44:53) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import slate Traceback (most recent call last): File "
", line 1, in File "/usr/local/lib/python2.7/site-packages/slate/init.py", line 48, in from slate import PDF File "/usr/local/lib/python2.7/site-packages/slate/slate.py", line 3, in from pdfminer.pdfparser import PDFParser, PDFDocument File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfparser.py", line 7, in from .psparser import PSStackParser, PSSyntaxError, PSEOF, literal_name, LIT, KWD, handle_error File "/usr/local/lib/python2.7/site-packages/pdfminer/psparser.py", line 4, in from .utils import choplist File "/usr/local/lib/python2.7/site-packages/pdfminer/utils.py", line 212, in 0x00f8, 0x00f9, 0x00fa, 0x00fb, 0x00fc, 0x00fd, 0x00fe, 0x00ff, File "/usr/local/lib/python2.7/site-packages/pdfminer/utils.py", line 180, in PDFDocEncoding = ''.join( chr(x) for x in ( ValueError: chr() arg not in range(256)
...or on 2.7.6: (OSX 10.10.3 Yosemite) Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import slate Traceback (most recent call last): File "
", line 1, in File "/Library/Python/2.7/site-packages/slate/init.py", line 48, in from slate import PDF File "/Library/Python/2.7/site-packages/slate/slate.py", line 3, in from pdfminer.pdfparser import PDFParser, PDFDocument ImportError: cannot import name PDFDocument
Those are different machines, but I had this working on the El Capitan machine before on newest releases of pdfminer and slate. Suspect that the upgrade to El Capitan was what broke the setup (does the python version have a role in this...?)
I'm also getting this. My set up is Windows 7, with a Conda install of Python 2.7.
pdfminer.__version__
'20140328'
If I try import slate I get:
In [1]: import slate ImportError Traceback (most recent call last)
in () ----> 1 import slate c:\users\nstoker\appdata\local\continuum\anaconda3\envs\py27\lib\site-packages\slate__init__.py in () 46 #along with slate. If not, see http://www.gnu.org/licenses/. 47 ---> 48 from slate import PDF c:\users\nstoker\appdata\local\continuum\anaconda3\envs\py27\lib\site-packages\slate\slate.py in () 1 from StringIO import StringIO 2 ----> 3 from pdfminer.pdfparser import PDFParser, PDFDocument 4 from pdfminer.pdfinterp import PDFResourceManager 5 from pdfminer.pdfinterp import PDFPageInterpreter as PI ImportError: cannot import name PDFDocument
I got the same error and eliminated it by going back to pdfminer==20100104. But then it gave me another error when I tried to use slate.PDF, so I switched to pdfminer==20100424 and was able to get text from a pdf.
I don't know if there is a newer pdfminer that works, but this was my workaround.
Same here, also on El Capitan, and with fresh pip installs of pdfminer (pdfminer-20140328) and slate (slate-0.3)
Jupyter QtConsole 4.1.1
Python 2.7.11 |Continuum Analytics, Inc.| (default, Dec 6 2015, 18:57:58)
Type "copyright", "credits" or "license" for more information.
IPython 4.0.3 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
%guiref -> A brief reference about the graphical user interface.
import slate
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-6c5d1974407e> in <module>()
----> 1 import slate
/Applications/anaconda/envs/python2/lib/python2.7/site-packages/slate/__init__.py in <module>()
46 #along with slate. If not, see <http://www.gnu.org/licenses/>.
47
---> 48 from slate import PDF
/Applications/anaconda/envs/python2/lib/python2.7/site-packages/slate/slate.py in <module>()
1 from StringIO import StringIO
2
----> 3 from pdfminer.pdfparser import PDFParser, PDFDocument
4 from pdfminer.pdfinterp import PDFResourceManager
5 from pdfminer.pdfinterp import PDFPageInterpreter as PI
ImportError: cannot import name PDFDocument
import pdfminer
pdfminer.__version__
Out[3]: '20140328'
Same here. El Capitan, pdfminer (pdfminer-20140328) and fleshly installed slate.
ImportError Traceback (most recent call last)
This error seems to still exists, at least if you use pip to install. I think it has to do with an out of date setup.py file at pypi, as pip installs version 0.3 even though the latest version is 0.5.2. Version 0.3 doesn't have the changes to the import statements that rectify things.
Some of the verbose output from pip:
C:\Users\Uber2>pip install slate --no-cache-dir -v
Collecting slate
1 location(s) to search for versions of slate:
* https://pypi.python.org/simple/slate/
Getting page https://pypi.python.org/simple/slate/
Starting new HTTPS connection (1): pypi.python.org
"GET /simple/slate/ HTTP/1.1" 200 311
Analyzing links from page https://pypi.python.org/simple/slate/
Found link https://pypi.python.org/packages/40/cc/ee9faa3ca14cfc1c5c76305a62b8da84d3ae5abf6cf8c89045a1d48f86ce/sl
-0.3.zip#md5=b86e93edd573572aea33ba4a45348940 (from https://pypi.python.org/simple/slate/), version: 0.3
Found link https://pypi.python.org/packages/b9/50/37ffcdb4f4fb4c41f49f2e01c3b1bc3b9f4d5fff0f47b890f86c95b5af5b/sl
-0.2.3.zip#md5=f50b363bf83ed0e171139468076cec7b (from https://pypi.python.org/simple/slate/), version: 0.2.3
Using version 0.3 (newest of versions: 0.2.3, 0.3)
tried installing from pip resulting in the same condition(s) as mentioned above. Then, pulled down the slate-master.zip and ran setup.py. Results, installed version 0.52 and was able to successfully import. This is on win 8.1, python 3.5.1
--Finished processing dependencies for slate==0.5.2
--Best match: pdfminer3k 1.3.0
Hi all,
Not sure if editing the slate.py is an option for people's environment, but if you change
line 3
from pdfminer.pdfparser import PDFParser, PDFDocument
to
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
line 38
self.doc = PDFDocument()
to
self.doc = PDFDocument(self.parser)
comment out lines 40 & 41
line 49
for page in self.doc.get_pages():
self.append(self.interpreter.process_page(page))
to
for page in PDFPage.create_pages(self.doc):
self.append(self.interpreter.process_page(page))
it works.
Here are the versions of libraries I am using
cssselect==0.9.1
lxml==3.6.0
pdfminer==20140328
pyquery==1.2.13
slate==0.3
wheel==0.24.0
Install slate using the setup.py file from the repository instead of PyPI. Even though PyPI claims to have version 0.5.2, it installs 0.3 instead. You can verify this by checking the version of slate installed using the following command
pip freeze
Everything works fine with the latest version of slate.
@rishabh-joshi slate installed by pip was 0.3. Got it resolved by following,
pip uninstall slate
pip install git+https://github.com/timClicks/slate.git
That might help someone else also.
in windows i don't have git installed, i downloaded the zip, cd in master and run "python setup.py install"
UPDATE: slate does only support Python 2.
Python 3.5.1 slate 0.3 pdfminer3k 1.3.0
In [4]: import slate
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-4-6c5d1974407e> in <module>()
----> 1 import slate
/usr/local/lib/python3.5/site-packages/slate/__init__.py in <module>()
46 #along with slate. If not, see <http://www.gnu.org/licenses/>.
47
---> 48 from slate import PDF
ImportError: cannot import name 'PDF'
"Resolution": Python 3.5.1 slate 0.5.2 pdfminer3k 1.3.0
pip3 uninstall slate
pip install git+https://github.com/timClicks/slate.git
In [1]: import slate
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-6c5d1974407e> in <module>()
----> 1 import slate
/usr/local/lib/python3.5/site-packages/slate/__init__.py in <module>()
64 #along with slate. If not, see <http://www.gnu.org/licenses/>.
65
---> 66 from .classes import PDF
/usr/local/lib/python3.5/site-packages/slate/classes.py in <module>()
23 except ImportError:
24 from pdfminer.pdfpage import PDFPage
---> 25 import utils
26
27 __all__ = ['PDF']
ImportError: No module named 'utils'
For me, isn't work :/
If I do this:
pip install git+https://github.com/timClicks/slate.git
Collecting git+https://github.com/timClicks/slate.git Cloning https://github.com/timClicks/slate.git to /tmp/pip-7yxixD-build Collecting distribute (from slate==0.5.2) Using cached distribute-0.7.3.zip Complete output from command python setup.py egg_info: running egg_info creating pip-egg-info/distribute.egg-info writing requirements to pip-egg-info/distribute.egg-info/requires.txt writing pip-egg-info/distribute.egg-info/PKG-INFO writing top-level names to pip-egg-info/distribute.egg-info/top_level.txt writing dependency_links to pip-egg-info/distribute.egg-info/dependency_links.txt Traceback (most recent call last): File "
", line 1, in File "/tmp/pip-build-CRm8aO/distribute/setup.py", line 58, in setuptools.setup(**setup_params) File "/usr/lib/python2.7/distutils/core.py", line 151, in setup dist.run_commands() File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "setuptools/command/egg_info.py", line 177, in run writer = ep.load(installer=installer) File "pkg_resources.py", line 2241, in load if require: self.require(env, installer) File "pkg_resources.py", line 2254, in require working_set.resolve(self.dist.requires(self.extras),env,installer))) File "pkg_resources.py", line 2471, in requires dm = self._dep_map File "pkg_resources.py", line 2682, in _dep_map self.__dep_map = self._compute_dependencies() File "pkg_resources.py", line 2699, in _compute_dependencies from _markerlib import compile as compile_marker ImportError: No module named _markerlib
@benjaminweb in classes
change import utils
to from . import utils
and it will work no problem. I think someone has already submitted a pull request.
Edit: I'm on Windows 10 Professional using Python 3.5.2
@benjaminweb A little supplement for @gileadslostson's answer. Try to change import utils
to import slate.utils
, and that works for me on Mac, python 3.5.2.
Also, the sample code might come across a 'UnicodeDecodeError' when executing with open(filepath) as f: slate.PDF(f)
. Try with open(filepath, 'rb') as f
instead, which would read the pdf file in binary.
Hope it helps : )
For both python2 and python3 I receive an import error
In [1]: import slate
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-6c5d1974407e> in <module>()
----> 1 import slate
/home/sayth/.pyenv/versions/3.5.2/lib/python3.5/site-packages/slate/__init__.py in <module>()
46 #along with slate. If not, see <http://www.gnu.org/licenses/>.
47
---> 48 from slate import PDF
ImportError: cannot import name 'PDF'
It doesn't matter which PDFminer I use either i started with this one https://github.com/euske/pdfminer received the error so install this one and still got error. https://github.com/goulu/pdfminer
Note with the second I used python3 setup.py install.
@gileadslostson @xgeric @benjaminweb
I changed the classes.py line 25 to from . import utils
and then reinstalled slate by running (from the slate-master directory) python setup.py install.
So now when I import slate
, I no longer get the error
ImportError: No module named 'utils'
BUT, now there is a new error:
ZipImportError Traceback (most recent call last)
in () ----> 1 from slate import utils ZipImportError: bad local file header: 'C:\\Users\\RobinG\\AppData\\Local\\Continuum\\Miniconda3\\lib\\site-packages\\slate-0.5.2-py3.5.egg'
This is on Windows 7, Python 3.5.2, slate 0.5.2. Any suggestions?
@dogfloss do you know if you uninstalled correctly?
As a workaround, if you try python setup.py develop
, it will use a symlink. If you can verify that the first error goes away, I will get the code changed.
@rdpickard :1st_place_medal: It works for me, thanks
Windows 10 Enterprise 64-bit
pip list appdirs (1.4.3) astroid (1.4.9) chardet (2.3.0) colorama (0.3.7) distribute (0.7.3) elementtree (1.2.7-20070827-preview) isort (4.2.5) lazy-object-proxy (1.2.2) mccabe (0.6.1) num2words (0.5.4) packaging (16.8) pdfminer.six (20160614) pip (9.0.1) pylint (1.6.5) pyodbc (4.0.14) pyparsing (2.2.0) setuptools (34.3.1) six (1.10.0) virtualenv (15.1.0) virtualenvwrapper-win (1.2.1) wrapt (1.10.8)
pip install slate Collecting slate Using cached slate-0.3.zip Requirement already satisfied: distribute in l:\python\lib\site-packages\distribute-0.7.3-py3.6.egg (from slate) Requirement already satisfied: setuptools>=0.7 in l:\python\lib\site-packages (from distribute->slate) Requirement already satisfied: six>=1.6.0 in l:\python\lib\site-packages (from setuptools>=0.7->distribute->slate) Requirement already satisfied: appdirs>=1.4.0 in l:\python\lib\site-packages (from setuptools>=0.7->distribute->slate) Requirement already satisfied: packaging>=16.8 in l:\python\lib\site-packages (from setuptools>=0.7->distribute->slate) Requirement already satisfied: pyparsing in l:\python\lib\site-packages (from packaging>=16.8->setuptools>=0.7->distribute->slate) Installing collected packages: slate Running setup.py install for slate ... done Successfully installed slate-0.3
python Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import slate Traceback (most recent call last): File "
", line 1, in File "L:\python\lib\site-packages\slate__init__.py", line 48, in from slate import PDF ImportError: cannot import name 'PDF'
@dogfloss @timClicks I am also getting the same error
import slate pdf='C:\Users\Shyam\Dropbox\Python\testdoc.pdf' with open(pdf,'rb') as f: ... doc=slate.PDF(f) ... Traceback (most recent call last): File "
", line 2, in File "C:\Users\Shyam\Anaconda3\lib\site-packages\slate-0.5.2-py3.5.egg\slate\classes.py", line 56, in init TypeError: init() missing 1 required positional argument: 'parser'
This is on Windows 10, Python 3.5., slate 0.5.2. Any suggestions? Please help me out. I am new to Python programming.
@wongomao bump^
@shyamiitk I had the same issue and I solved it by editing classes.py
. Ironicely, functions that are supposed to be called in Python 3 crash in Python 3. You just remove them, and use the default case.
In classes.py
, you edit lines 55-61:
if PYTHON_3:
self.doc = PDFDocument()
self.parser.set_document(self.doc)
self.doc.set_parser(self.parser)
self.doc.initialize(password)
else:
self.doc = PDFDocument(self.parser, password)
to
self.doc = PDFDocument(self.parser, password)
and lines 69-72:
if PYTHON_3:
page_generator = self.doc.get_pages()
else:
page_generator = PDFPage.create_pages(self.doc)
to
page_generator = PDFPage.create_pages(self.doc)
I use Windows 10, Python 3.6.2 and Slate 0.5.2, and it seems to work fine now.
@canguezelhan I initially thought your code did the trick but after running into another error later (to do with reading unicodes) I have found that the original code worked provided I read the file in as binary to begin with using. with open(path,'rb') as f:
Just to clarify -
(1) This library has been unusable for 3+ years due to unresolvable dependency issues.
(2) Even when it works, this library only supports Python 2.7.
I have the same problem. My friend can use this code without any errors . He told me he didn't install anything. When i installed packages i get more errors idk
@sfsdfd There exist working forks of this repo. A script using slate is spitting out text from a PDF as I type. Check https://github.com/alkivi-sas/slate/tree/python3
@canguezelhan I have the same problem. Silly question but how can I find the classes.py
to try out your solution? Thanks
@darrencl https://github.com/timClicks/slate/blob/master/src/slate/classes.py You're welcome! :)
@canguezelhan hi, thank you for your response. I just installed slate (before I only installed pdfminer.six
), but could not locate the file. I installed the slate using easy_install
and as I am aware, it is just installing a file named slate-0.3-py3.6.egg
in C:\Users\Darren\Anaconda3\Lib\site-packages
. Any ideas where it is in my directory?
@BadrulAlom Hi, I was doing the same thing (i.e. opening the file as binary using 'rb'
option), but having the error __init__() missing 1 required positional argument: 'parser'
, any solution?
@shyamiitk I had the same issue and I solved it by editing
classes.py
. Ironicely, functions that are supposed to be called in Python 3 crash in Python 3. You just remove them, and use the default case.In
classes.py
, you edit lines 55-61:if PYTHON_3: self.doc = PDFDocument() self.parser.set_document(self.doc) self.doc.set_parser(self.parser) self.doc.initialize(password) else: self.doc = PDFDocument(self.parser, password)
to
self.doc = PDFDocument(self.parser, password)
and lines 69-72:
if PYTHON_3: page_generator = self.doc.get_pages() else: page_generator = PDFPage.create_pages(self.doc)
to
page_generator = PDFPage.create_pages(self.doc)
I use Windows 10, Python 3.6.2 and Slate 0.5.2, and it seems to work fine now.
Worked for me.
python 3.11 - still getting this error: cannot import name 'PDF' from partially initialized module 'slate' slate==0.3 conda 4.10.3 on Ubuntu .... pip install does not give options > 0.3
When trying to import slate, the following error message occurs. pdfminer and slate have been installed via pip.
Using Windows XP and Python 2.7
Looking at the pdfminer website, i found the following command that works:
Maybe pdfminer changed their API?