jackiekazil / data-wrangling

Code repository for Data Wrangling with Python (O'Reilly)
559 stars 564 forks source link

PDFSyntaxError('No /Root object! - Is this really a PDF?') #10

Open ghost opened 7 years ago

ghost commented 7 years ago

code like this: import slate with open('xxx.pdf') as f: doc = slate.PDF(f) raise problem: Traceback (most recent call last): File "", line 2, in File "C:\Python27\lib\site-packages\slate\slate.py", line 38, in init self.doc.set_parser(self.parser) File "C:\Python27\lib\site-packages\pdfminer\pdfparser.py", line 333, in set_parser raise PDFSyntaxError('No /Root object! - Is this really a PDF?') pdfminer.pdfparser.PDFSyntaxError: No /Root object! - Is this really a PDF?

kjam commented 6 years ago

Hi @Canacedq can you share what PDF you are using? Is it one included in the repository?

conango commented 6 years ago

Hi,I'm also encountered with this issue and using the exactly same PDF file downloaded from this github. pdfminer is 20110515 ver. and it can't run with the latest version of pdfminer. Please have a look at this issue,thanks!

kjam commented 6 years ago

Hi @conango,

I'm worried this might be a Windows encoding error, as it works fine on my Linux laptop. Can you try opening the file in binary mode?

with open('xxx.pdf', 'rb') as f:

Let me know if that works, thanks!

-katharine

conango commented 6 years ago

Hi @kjam , This is the result screenshot. Hope it can help. qq 20180108092153 Thanks

kjam commented 6 years ago

Hi,

Did you try and pass that to slate? so next run:

doc = slate.PDF(mypdf)

?

On Mon, Jan 8, 2018 at 2:25 AM Lee notifications@github.com wrote:

Hi @kjam https://github.com/kjam , This is the result screenshot. Hope it can help. [image: qq 20180108092153] https://user-images.githubusercontent.com/8816223/34656484-d3c87d30-f455-11e7-8d3e-eac562927535.png Thanks

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/jackiekazil/data-wrangling/issues/10#issuecomment-355869550, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUW2GuxNuV5_0FS9eq1Al5MWxLhZCeEks5tIW59gaJpZM4QeRZ9 .