stlehmann / pdftools

small collection of python scripts for pdf manipulation
MIT License
93 stars 18 forks source link

pdftools merge fails for some PDFs #14

Open cjfp opened 2 years ago

cjfp commented 2 years ago

When I try to merge a PDF of a Virgin Mobile phone bill, it crashes on Windows 7 / Cygwin.

$ pdftools merge -o test.pdf virgin.pdf Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 229, in new return decimal.Decimal.new(cls, utils.str_(value), context) decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/pdftools", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/site-packages/pdftools/_cli.py", line 274, in main pdf_merge(ARGS.src, ARGS.output, ARGS.delete) File "/usr/local/lib/python3.8/site-packages/pdftools/pdftools.py", line 42, in pdf_merge writer.write(outputfile) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 482, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 556, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 586, in _sweepIndirectReferences newobj = self._sweepIndirectReferences(externMap, newobj) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 577, in _sweepIndirectReferences newobj = data.pdf.getObject(data) File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1611, in getObject retval = readObject(self.stream, self) File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 66, in readObject return DictionaryObject.readFromStream(stream, pdf) File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 579, in readFromStream value = readObject(stream, pdf) File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 92, in readObject return NumberObject.readFromStream(stream) File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 271, in readFromStream return FloatObject(num) File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 231, in new return decimal.Decimal.new(cls, str(value)) decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

$ pip list Package Version


pdftools 2.0.2 pip 21.3.1 PyPDF2 1.26.0 setuptools 59.1.1

If I go into Adobe, optimize the PDF, and save to a new file, then there are no problems. Do you have any suggestions about how to handle this from the command line? I wish I had a PDF to send without tons of private information.

stlehmann commented 2 years ago

@cjfp thanks for reporting. As pdftools is just a CLI for PyPDF2 I suggest you try updating PyPDF2 to the newest version and see if this solves the problem.