mfenniak / pyPdf

Pure-Python PDF Library; this repository is no longer maintained, please see https://github.com/knowah/PyPDF2/ insead.
https://github.com/knowah/PyPDF2/
Other
276 stars 85 forks source link

Issue about the _sweepIndirectReferences function #26

Open Eladio opened 13 years ago

Eladio commented 13 years ago

I think there's a little problem in the PdfFileWriter class' _sweepIndirectReferences function. There's a list called self.stack where the indirect references that we've already seen are stored. I suppose that it is used so that we don't sweep the same indirect reference over and over again. However in the function after it's sweeped once it is removed from self.stack, I don't see the point of that. If there are lots of objects referencing the same object ( for example if we copy the Logical Structure of the pdf as well, many objects reference the same page object wich is quite expensive to sweep ) mantaining it in self.stack could mean significant improvement in time.

if data.pdf == self:
            if data.idnum in self.stack:
                return data
            else:
                self.stack.append(data.idnum)
                realdata = self.getObject(data)
                self._sweepIndirectReferences(externMap, realdata)
                self.stack.pop()
                return data

I think it should be:

if data.pdf == self:
            if data.idnum in self.stack:
                return data
            else:
                self.stack.append(data.idnum)
                realdata = self.getObject(data)
                self._sweepIndirectReferences(externMap, realdata)
                return data
AeroNotix commented 12 years ago

Hi,

This fixes an issue I was having with certain kinds of documents getting stuck in a recursive loop in the self._sweepIndirectReferences call.