Open WolfgangFahl opened 2 years ago
2021_10_20_15_51_33.pdf Has "ALDI SÜD" as copyable text in it (tested with Preview on MacOS). When reading it with PyPDF3 using:
def getPDFText(self): ''' get my PDF Text ''' pdfText=None if self.scannedFile.lower().endswith("pdf"): pdfText="" pdf_file = open(self.scannedFile, 'rb') read_pdf = PdfFileReader(pdf_file) number_of_pages = read_pdf.getNumPages() pdfText="" delim="" for pageNo in range(number_of_pages): page = read_pdf.getPage(pageNo) page_content = page.extractText() pdfText+=delim+page_content delim="\n" return pdfText
i get 'ALDI SƒD' instead. How can this be fixed?
see also https://stackoverflow.com/q/64459824/1497139
2021_10_20_15_51_33.pdf Has "ALDI SÜD" as copyable text in it (tested with Preview on MacOS). When reading it with PyPDF3 using:
i get 'ALDI SƒD' instead. How can this be fixed?