jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
5.99k stars 618 forks source link

Add `autodetect_direction` option to text-extraction methods #1109

Open jsvine opened 3 months ago

jsvine commented 3 months ago

Inspired by https://github.com/jsvine/pdfplumber/issues/1102#issuecomment-1983611599

Particularly with rotated (non-horizontal) text, it may be difficult for a user to predict whether the text will go top-to-bottom or bottom-to-top. I could envision an opt-in, heuristic-based autodetect_direction=True flag that examines the original character order to determine the correct direction for a given sub-chunk of text. (In theory, this could also apply to the non-rotated text.)