Open ubmarco opened 2 years ago
Look at https://github.com/ChrizH/pdfstructure - it implements a pdfminer based solution that checks the font style of each lines and checks for prepended chapter numbers. Here is an article about the solution: https://medium.com/@_chriz_/development-of-a-structure-aware-pdf-parser-7285f3fe41a9
This looks also worth testing: https://github.com/kermitt2/grobid
Look at https://github.com/ChrizH/pdfstructure - it implements a pdfminer based solution that checks the font style of each lines and checks for prepended chapter numbers. Here is an article about the solution: https://medium.com/@_chriz_/development-of-a-structure-aware-pdf-parser-7285f3fe41a9