WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont=

jackyetz commented 5 years ago

When extracting text from pdf (https://www.aanda.org/articles/aa/pdf/2006/02/aa3061-05.pdf), I got a lot of warning and the extraction failed.

My code is as: import os import sys import importlib importlib.reload(sys) from pdfminer.pdfparser import PDFParser,PDFDocument from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter from pdfminer.converter import PDFPageAggregator from pdfminer.layout import LTTextBoxHorizontal,LAParams from pdfminer.pdfinterp import PDFTextExtractionNotAllowed def parse(path,target): if (os.path.exists(target)): os.remove(target) fp = open(path, 'rb') praser = PDFParser(fp) doc = PDFDocument() praser.set_document(doc) doc.set_parser(praser)

doc.initialize()

if not doc.is_extractable:
    raise PDFTextExtractionNotAllowed
else:
    rsrcmgr = PDFResourceManager()
    laparams = LAParams(all_texts = True)
    device = PDFPageAggregator(rsrcmgr, laparams=laparams)
    interpreter = PDFPageInterpreter(rsrcmgr, device)

    for page in doc.get_pages(): # doc.get_pages() 获取page列表
        interpreter.process_page(page)
        layout = device.get_result()
        for x in layout:
            if (isinstance(x, LTTextBoxHorizontal)):
                with open(target, 'a', encoding='utf-8') as f:
                    results = x.get_text()
                    # print(results)
                    f.write(results + '\n')

if name == 'main': path = r'./pdf/aa3061-05.pdf' parse(path,path.replace('.pdf','.txt'))

the warnings: ...... WARNING:pdfminer.converter:undefined: , 5 WARNING:pdfminer.converter:undefined: , 5 WARNING:pdfminer.converter:undefined: , 4 WARNING:pdfminer.converter:undefined: , 5 WARNING:pdfminer.converter:undefined: , 5 WARNING:pdfminer.converter:undefined: , 5 WARNING:pdfminer.converter:undefined: , 5 ......

paulfwb commented 4 years ago

I'm getting tem same problem. I'll let you know if I fix it.

rocket2016 commented 3 years ago

Could you share your solution, please! I have the same problem.

rocket2016 commented 3 years ago

I'm getting tem same problem. I'll let you know if I fix it.

Could you share your solution, please! I have the same problem.

jaepil / pdfminer3k

WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont= #12