useblocks / libpdf

Extract structured data from PDFs
MIT License
8 stars 2 forks source link

Position of words and characters #2

Closed ubmarco closed 3 years ago

ubmarco commented 3 years ago

The UML model should include the position of lines, words and characters. The LTTextBox instances of pdfminer already contain LTTextLine and LTChar, so the information should be available.

The feature should be optional because it will clutter the output files. Main use case is API.