Open Theblackcat98 opened 7 months ago
@Theblackcat98 The error you're encountering with pdfminer
is due to changes in the pdfminer.six
library, where the codec argument is no longer accepted by the TextConverter
class.
Try out these steps, and let me know, if it works
pdfminer.six
, through pip install pdfminer.six
convert_pdf_to_txt
function to remove the codec
argument from the TextConverter initialization.
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from io import StringIO
def convert_pdf_to_txt(path): rsrcmgr = PDFResourceManager() retstr = StringIO() laparams = LAParams() device = TextConverter(rsrcmgr, retstr, laparams=laparams) fp = open(path, 'rb') interpreter = PDFPageInterpreter(rsrcmgr, device) for page in PDFPage.get_pages(fp): interpreter.process_page(page) text = retstr.getvalue() fp.close() device.close() retstr.close() return text
let me know, if it works
thanks
Add pdfminer to requirements.
When running PDF-to-Text option I get
Full Error: