I think there's a little problem in the PdfFileWriter class' _sweepIndirectReferences function. There's a list called self.stack where the indirect references that we've already seen are stored. I suppose that it is used so that we don't sweep the same indirect reference over and over again. However in the function after it's sweeped once it is removed from self.stack, I don't see the point of that. If there are lots of objects referencing the same object ( for example if we copy the Logical Structure of the pdf as well, many objects reference the same page object wich is quite expensive to sweep ) mantaining it in self.stack could mean significant improvement in time.
if data.pdf == self:
if data.idnum in self.stack:
return data
else:
self.stack.append(data.idnum)
realdata = self.getObject(data)
self._sweepIndirectReferences(externMap, realdata)
self.stack.pop()
return data
I think it should be:
if data.pdf == self:
if data.idnum in self.stack:
return data
else:
self.stack.append(data.idnum)
realdata = self.getObject(data)
self._sweepIndirectReferences(externMap, realdata)
return data
I think there's a little problem in the PdfFileWriter class' _sweepIndirectReferences function. There's a list called self.stack where the indirect references that we've already seen are stored. I suppose that it is used so that we don't sweep the same indirect reference over and over again. However in the function after it's sweeped once it is removed from self.stack, I don't see the point of that. If there are lots of objects referencing the same object ( for example if we copy the Logical Structure of the pdf as well, many objects reference the same page object wich is quite expensive to sweep ) mantaining it in self.stack could mean significant improvement in time.
I think it should be: