xpmethod / opensyllabus

Other
48 stars 10 forks source link

Text Extraction: pdf --> txt #14

Open grahamsack opened 10 years ago

grahamsack commented 10 years ago

There are a few pre-existing python packages for this...

jonahsmith commented 10 years ago

FYI, I can't get Slate to work either. I might be missing something, but here is the error I'm getting:

  File "slateTest.py", line 1, in <module>
    import slate
  File "/Library/Python/2.7/site-packages/slate/__init__.py", line 48, in <module>
    from slate import PDF
  File "/Library/Python/2.7/site-packages/slate/slate.py", line 3, in <module>
    from pdfminer.pdfparser import PDFParser, PDFDocument
ImportError: cannot import name PDFDocument

Looks like there's a problem calling something in pdfminer? Graham, is this the issue you were having yesterday?

grahamsack commented 10 years ago

Yes. Same issue. I'm using pdfminer from command line now

Sent from my iPhone

On Mar 1, 2014, at 12:38 PM, jonahsmith notifications@github.com wrote:

FYI, I can't get Slate to work either. I might be missing something, but here is the error I'm getting:

File "slateTest.py", line 1, in import slate File "/Library/Python/2.7/site-packages/slate/init.py", line 48, in from slate import PDF File "/Library/Python/2.7/site-packages/slate/slate.py", line 3, in from pdfminer.pdfparser import PDFParser, PDFDocument ImportError: cannot import name PDFDocument Looks like there's a problem calling something in pdfminer? Graham, is this the issue you were having yesterday?

— Reply to this email directly or view it on GitHub.

aburkh commented 10 years ago

The problem is that slate tries to import PDFDocument from pdfminer.pdfparser. The correct module is pdfminer.pdfdocument.

daryltucker commented 10 years ago

I still see this issue.

I was able to sudo pip install --upgrade --ignore-installed slate==0.3 pdfminer==20110515, which are compatible versions.

The slate devs are aware.

KurtOstergaard commented 8 years ago

I tried the slate==0.3 and pdfminer==20110515 line and I still get an error. Any other workarounds?

tobiasmcnulty commented 8 years ago

Works with slate==0.3 and pdfminer=20110515 for me

tobiasmcnulty commented 8 years ago

If you're inside a virtualenv make sure not to use sudo

arderyp commented 8 years ago

@tobiasmcnulty's suggestion works for me too. Thanks!