Open Tobeabellwether opened 1 year ago
Thanks for flagging this @Tobeabellwether. That makes sense, given the approach pdfplumber
takes to extracting text. I think adding support for rotated pages would be a good addition to the library.
I have a similar issue where some parts of the text is 90 degrees rotated (in a portrait page):
Copy-pasting the text manually works fine, but the .extract_text()
method returns it in reversed order and badly segmented:
OHW
A door-to-door polio vaccination
©
campaign in Yemen :otohP
I'll find a workaround but agree this would be a great new feature for this library !
Describe the bug
A clear and concise description of what the bug is. When I use
page.extract_text()
to extract text from a 90 degree rotated page, the results is just some garbled wordsCode to reproduce the problem
Paste it here, or attach a Python file.
PDF file
Please attach any PDFs necessary to reproduce the problem.
If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.
Expected behavior
What did you expect the result should have been?
Actual behavior
What actually happened, instead?
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
Additional context
Add any other context/notes about the problem here.