Closed kalelsun closed 1 year ago
Hi @kalelsun, and thanks for your interest in this library. Have you tried repairing the PDF? Does that change the results? When I've seen issues like these in the past, they often are caused by malformed documents.
Hi @kalelsun, and thanks for your interest in this library. Have you tried repairing the PDF? Does that change the results? When I've seen issues like these in the past, they often are caused by malformed documents.
Yes, you are absolutely right! I have achieved the desired result by fixing the PDF.
Describe the bug
When using pdfplumber to read a PDF, I encountered an issue where the top and bottom attributes of chars on certain pages exceed the bbox (bounding box) of the page.
Code to reproduce the problem
PDF file
the_page.pdf
Expected behavior
the top and bottom attributes of chars should be within the bbox of the page.
Actual behavior
the top and bottom attributes of characters exceed the bbox of the page.
Screenshots
Only the areas enclosed by the blue boxes contain characters, the rest are images.
Environment
Additional context
colab