lebedov / python-pdfbox

Python interface to Apache PDFBox command-line tools.
Other
75 stars 24 forks source link

Not an issue : Future support of python-pdfbox. #17

Closed maddy2021 closed 4 years ago

maddy2021 commented 4 years ago

pdfbox has largely used the package in java and i liked it. Now I have to read pdf file in python and you have done the superb job so are you going to provide support-related pdfbox all function related to extracting data from pdf? Can I extract headings from pdf using this library?

lebedov commented 4 years ago

I wasn't planning to add support for pdfbox's full API at the present time, mainly because of limited time on my part and the fact that I'm not using python-pdfbox much at the present (python-pdfbox wraps pdfbox's command-line interface rather than the full API). I suspect you probably would need to make use of pdfbox's PDFTextStripperByArea class; take a look at this gist for an example of how to access pdfbox's full API from Python.