Open ghost opened 13 years ago
i don't know why the formatting broke - i copy-pasted pure text :( also i can provide the full traceback if needed
I just put a workaround into CamlPDF to fix the same problem.
The malformity is that the streams in files produced by Microsoft Reporting Services put a space character immediately after the 'stream' keyword (before the CR / LF).
The solution is, after reading the stream keyword, to consume all whitespace-characters-other-than-cr-or-lf before looking for the newline as normal.
hey folks :)
on some files generated by Microsoft Reporting Service i get one of the following errors using this script:
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter() input1 = PdfFileReader(file("infile.pdf", "rb"))
output.addPage(input1.getPage(0))
outputStream = file("outfile.pdf", "wb")
output.write(outputStream)
Traceback (most recent call last): File "/backup/print/municipality stara zagora/110228/Aitos_1/test.py", line 20, in
output.write(outputStream)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/generic.py", line 232, in readFromStream
return NumberObject(name)
ValueError: invalid literal for int() with base 10: ''
or using another approach (loading pages in array and then saving them):
Traceback (most recent call last): File "/backup/print/municipality stara zagora/110228/municipality stara zagora pdf combine 110228 start.py", line 60, in
outpdf.write(outfile)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/pdf.py", line 545, in getObject
self.stream.seek(start, 0)
ValueError: I/O operation on closed file
where the file is (of course) not closed
i workaround it resaving the file using pdftk like this:
from pyPdf import PdfFileWriter, PdfFileReader
import shlex, subprocess pdftkcommand = 'pdftk infile.pdf cat output fixed_infile.pdf' args = shlex.split(pdftkcommand) subprocess.call(args)
output = PdfFileWriter() input1 = PdfFileReader(file("fixed_infile.pdf", "rb"))
output.addPage(input1.getPage(0))
outputStream = file("outfile.pdf", "wb")
output.write(outputStream)
but only when using last pdftk version (1.44 - 1.41 produces blank pdf) - i guess this is what pdftk guys have fixed: 1.43 - September 30, 2010 Fixed a stream parsing bug that was causing page content to disappear after merge of PDFs generated by Microsoft Reporting Services PDF Rendering Extension 10.0.0.0.
unfortunately i can't provide the broken file as contents are confidential
hope this helps :)
georgi