Converting PDF to text: 0% 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 722, in __init__
self.read_xref_from(parser, pos, self.xrefs)
File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 1000, in read_xref_from
xref.load(parser)
File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 282, in load
raise PDFNoValidXRef("Invalid PDF stream spec.")
pdfminer.pdfdocument.PDFNoValidXRef: Invalid PDF stream spec.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/drive/MyDrive/ebook-GPT-translator/text_translation.py", line 347, in <module>
text = convert_pdf_to_text(filename,startpage,endpage)
File "/content/drive/MyDrive/ebook-GPT-translator/text_translation.py", line 221, in convert_pdf_to_text
end_page = get_total_pages(pdf_filename)
File "/content/drive/MyDrive/ebook-GPT-translator/text_translation.py", line 217, in get_total_pages
document = PDFDocument(parser)
File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 727, in __init__
newxref.load(parser)
File "/usr/local/lib/python3.9/dist-packages/pdfminer/pdfdocument.py", line 241, in load
(_, obj) = parser.nextobject()
File "/usr/local/lib/python3.9/dist-packages/pdfminer/psparser.py", line 609, in nextobject
(pos, token) = self.nexttoken()
File "/usr/local/lib/python3.9/dist-packages/pdfminer/psparser.py", line 526, in nexttoken
self.fillbuf()
File "/usr/local/lib/python3.9/dist-packages/pdfminer/psparser.py", line 239, in fillbuf
raise PSEOF("Unexpected EOF")
pdfminer.psparser.PSEOF: Unexpected EOF
解析这个optimized过的pdf报错, 在deepl里面是可以正常处理的。 https://assets.ctfassets.net/95kuvdv8zn1v/44FqPJmYPZRwiZN2socdOK/14f5eb025d87a452100d80f513567f2a/Cruise_Impact_Report_-_2022-optimized.pdf