jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.48k stars 658 forks source link

Nothing founded from a pdf. No pages, no chars, nothing. #1135

Closed AppleRabbitDENG closed 4 months ago

AppleRabbitDENG commented 5 months ago

1.pdf

import pdfplumber
pdf = pdfplumber.open(r'1.pdf')
for page in pdf.pages:
    text = page.extract_text()#提取文本
    print(text)
mkl-public commented 5 months ago

Your PDF file is broken, it's cut off at the end. Some PDF processors can repair it, some cannot. But the issue is with the file, not the processors.

AppleRabbitDENG commented 5 months ago

thank you. Could you nicely tell me, how do you find that? How can I detect that?

jsvine commented 4 months ago

There are probably several ways to determine this, but one is to open the PDF in a plain-text editor. At the bottom, you'll see something like this abrupt ending:

Screenshot

(And thanks, @mkl-public for responding early on.)