Open jackyetz opened 5 years ago
I'm getting tem same problem. I'll let you know if I fix it.
Could you share your solution, please! I have the same problem.
I'm getting tem same problem. I'll let you know if I fix it.
Could you share your solution, please! I have the same problem.
When extracting text from pdf (https://www.aanda.org/articles/aa/pdf/2006/02/aa3061-05.pdf), I got a lot of warning and the extraction failed.
My code is as: import os import sys import importlib importlib.reload(sys) from pdfminer.pdfparser import PDFParser,PDFDocument from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter from pdfminer.converter import PDFPageAggregator from pdfminer.layout import LTTextBoxHorizontal,LAParams from pdfminer.pdfinterp import PDFTextExtractionNotAllowed def parse(path,target): if (os.path.exists(target)): os.remove(target) fp = open(path, 'rb') praser = PDFParser(fp) doc = PDFDocument() praser.set_document(doc) doc.set_parser(praser)
if name == 'main': path = r'./pdf/aa3061-05.pdf' parse(path,path.replace('.pdf','.txt'))
the warnings: ...... WARNING:pdfminer.converter:undefined:, 5
WARNING:pdfminer.converter:undefined: , 5
WARNING:pdfminer.converter:undefined: , 4
WARNING:pdfminer.converter:undefined: , 5
WARNING:pdfminer.converter:undefined: , 5
WARNING:pdfminer.converter:undefined: , 5
WARNING:pdfminer.converter:undefined: , 5
......