lebedov / python-pdfbox

Python interface to Apache PDFBox command-line tools.
Other
75 stars 24 forks source link

Bounding box text coordinates #25

Open victor-ab opened 3 years ago

victor-ab commented 3 years ago

Any Ideas on how to extract the text with its corresponding bounding boxes? Saw some people extending the PDFTextStripper class, but JPype can't handle it.

lebedov commented 3 years ago

Do you mean something like this?

victor-ab commented 3 years ago

Yep!