Doesn't work for rotated page

Tobeabellwether commented 1 year ago

Describe the bug

A clear and concise description of what the bug is. When I use page.extract_text() to extract text from a 90 degree rotated page, the results is just some garbled words

Code to reproduce the problem

Paste it here, or attach a Python file.

PDF file

Please attach any PDFs necessary to reproduce the problem.

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

What did you expect the result should have been?

Actual behavior

What actually happened, instead?

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

pdfplumber version: [e.g., 0.5.22]
Python version: [e.g., 3.8.1]
OS: [e.g., Mac, Linux, etc.]

Additional context

Add any other context/notes about the problem here.

jsvine commented 1 year ago

Thanks for flagging this @Tobeabellwether. That makes sense, given the approach pdfplumber takes to extracting text. I think adding support for rotated pages would be a good addition to the library.

OrianeN commented 1 year ago

I have a similar issue where some parts of the text is 90 degrees rotated (in a portrait page):

Copy-pasting the text manually works fine, but the .extract_text() method returns it in reversed order and badly segmented:

OHW
A door-to-door polio vaccination
©
campaign in Yemen :otohP

I'll find a workaround but agree this would be a great new feature for this library !

jsvine / pdfplumber