Open GoogleCodeExporter opened 9 years ago
Hmmm, this will probably require some additional thought.
Some pages have more than one entry in their content arrays. For those, it
would not be useful to simply take the first content array element.
Original comment by pmaupin
on 18 Oct 2012 at 10:32
Ok, I don't know is it the right solution, but at least it works with several
content streams:
if isinstance(page.Contents, PdfArray):
if len(page.Contents) == 1:
contents = page.Contents[0]
else:
# decompress and join multiple streams
contlist = [c for c in page.Contents]
uncompress(contlist)
stream = '\n'.join([c.stream for c in contlist])
contents = PdfDict(
Length=len(stream),
stream=stream
)
else:
contents = page.Contents
Original comment by exp...@gmail.com
on 17 Nov 2012 at 3:14
That makes sense. The main thing I don't like about it is that it doesn't play
very well with pdfrw's lack of good compression filter support ;-)
On that note, we probably need to make it barf if the decompression fails. I
think the current version of uncompress returns False if it wasn't able to do
its job -- that should probably cause an exception to be raised here.
(Otherwise, it will concatenate a still-compressed content dictionary into the
new dict.)
Thanks for reporting both the bug and most of the fix.
Pat
Original comment by pmaupin
on 17 Nov 2012 at 3:57
Original issue reported on code.google.com by
exp...@gmail.com
on 18 Oct 2012 at 10:12