metebalci / pdftitle

a utility to extract the title from a PDF file
GNU General Public License v3.0
131 stars 21 forks source link

Hangds with this pdf #19

Closed vprelovac closed 3 years ago

vprelovac commented 3 years ago

Processing gets stuck with this pdf http://www.ntia.doc.gov/files/ntia/publications/2003-allochrt.pdf

metebalci commented 3 years ago

This pdf being a poster is normally beyond the purpose of this tool which is aimed at extracting titles from regular (mainly scientific) articles. Having said that, the error is due to the PDF interpreter in the pdfminer library used by the tool. It is used to fix the space problems in some articles, but in this pdf, it takes either too much time and/or too much memory or there is a problem in this library. I am not sure if I can/should debug the 3rd party library, so marking this issue as wont fix at least for the moment.