Closed logan-jones closed 11 years ago
Hi Logan. I'm no longer maintaining pyPdf, but the project has been forked as pyPdf2 and is being maintained under that new name. (https://github.com/knowah/PyPDF2/) Perhaps the new maintainer can help you out.
Don't know if anyone else has run into this, but ExtractText() seems to loop infinitely on certain files, and even then, only certain pages on those files. Even left over a 3-day weekend, it remains stuck. I've attached a short sample script illustrating a workaround for whomever comes after me in search of a solution. It uses a timeout argument on the multiprocessing module's Process object.
I don't like having to re-open the handle for every page, but I really don't see another option at present.