Closed tscrosb closed 3 years ago
Appears to be a duplicate of #332 If the text isn't recognised when you run pdfminer's pdf2txt as described in https://pdfminersix.readthedocs.io/en/latest/tutorial/commandline.html#pdf2txt-py, I would recommend you to raise an issue over at https://github.com/pdfminer/pdfminer.six/issues. If it is recognised, then please reopen this issue and share the PDF as well.
What are you trying to do?
I am using pdfplumber to look for 12 digit strings in a PDF. My code worked when the font was Helvetica, but stopped working when I changed font to stsong-light
What code are you using to do it?
for filepath in glob.iglob(r"C:\Users\thomascrosbie\Desktop\ALL ANALYSIS\ANALYSIS_6*.pdf"): print(filepath) pdf_file = filepath excel_output = set() with pdfplumber.open(pdf_file) as pdf : pages = pdf.pages for i,pg in enumerate(pages): tbl = pages[i].extract_text()
look for account number
PDF file
Expected behavior
Excel file with 12 digit string
Actual behavior
12 digit string not detected
Environment