johndoe31415 / pdfminify

PDF minifier that allows removing duplicate data, re-compresses images, creation of PDF/A-1b and digital PDF signing
GNU General Public License v3.0
55 stars 11 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 970 #8

Closed aminechraibi closed 3 years ago

aminechraibi commented 3 years ago

Error while runing: pdfminify in.pdf out.pdf

Stack trace:

Traceback (most recent call last):
  File "C:\Users\ami\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\ami\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "F:\venvs\venv37-ml\Scripts\pdfminify.exe\__main__.py", line 4, in <module>
  File "f:\venvs\venv37-ml\lib\site-packages\pdfminify\__main__.py", line 148, in <module>
    pdf_filter.run()
  File "f:\venvs\venv37-ml\lib\site-packages\llpdf\filters\DownscaleImageOptimization.py", line 72, in run
    for (page_obj, page_content) in self._pdf.parsed_pages:
  File "f:\venvs\venv37-ml\lib\site-packages\llpdf\PDFDocument.py", line 140, in parsed_pages
    pagedata = pagedata.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 970: invalid continuation byte
johndoe31415 commented 3 years ago

Hey, this is very likely a malformed input PDF document that has invalid pagedata. It's impossible for me to verify without the "in.pdf" you supplied to the tool. Any chance you can create a document that exhibits the same issue which you could share?

johndoe31415 commented 3 years ago

Cannot reproduce the issue without your support. Closing the issue, please feel free to reopen when you can send me a file to reproduce the issue with.