jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

pdfplumber will be hung up when open pdf which is damaged #681

Closed Gadil-1987 closed 2 years ago

Gadil-1987 commented 2 years ago

Describe the bug

pdfplumber will be hung up when open pdf which is damaged

Code to reproduce the problem

import pdfplumber pdf = pdfplumber.open("C:/Users/admin/Desktop/AN202103291477905709.pdf") print(len(pdf.pages))

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Environment

Gadil-1987 commented 2 years ago

Sorry,I don't know how to upload my tested pdf file to github,if anyone knows,please tell me

jsvine commented 2 years ago

Hi @Gadil-1987, in the comment box on GitHub, you should see something that says: "Attach files by dragging & dropping, selecting or pasting them."

Screen Shot 2022-07-05 at 5 10 38 PM

Does that work for you?

Gadil-1987 commented 2 years ago

I see, 3ks.

Gadil-1987 commented 2 years ago

AN202002141375118541.pdf AN202103291477905709.pdf

these two pdf files could make pdfplumbe to be hung up. maybe i need to try other ways to make sure current pdf file is ok

Gadil-1987 commented 2 years ago

image To fix this issue, I had to use PyPDF2

Gadil-1987 commented 2 years ago

image To fix this issue, I had to use PyPDF2

jsvine commented 2 years ago

Hi @Gadil-1987, and thanks for your interest in this library. Given that these PDFs are pretty badly corrupted (neither CPDF or GhostScript were able to repair them), and because the root issue seems to come from pdfminer.six, this project's main dependency, I'm going to close this issue. Still, it's good to know about and I appreciate you pointing it out.

Gadil-1987 commented 2 years ago

Thanks, This libray helps me a lot