Detect headlines in PDFs without outline

useblocks / libpdf

Extract structured data from PDFs

MIT License

8 stars 2 forks source link

Open ubmarco opened 2 years ago

ubmarco commented 2 years ago

Look at https://github.com/ChrizH/pdfstructure - it implements a pdfminer based solution that checks the font style of each lines and checks for prepended chapter numbers. Here is an article about the solution: https://medium.com/@_chriz_/development-of-a-structure-aware-pdf-parser-7285f3fe41a9

ubmarco commented 2 years ago