pcorless / icepdf

PDF Rendering and Viewing API in Java
Apache License 2.0
76 stars 21 forks source link

Wrong xref starting position when parsing xref stream #321

Closed gtache closed 9 months ago

gtache commented 9 months ago

The method parseCrossReferenceStream substracts XREF_MARKER.length from the offset

    private CrossReference parseCrossReferenceStream(ByteBuffer byteBuffer, int offset)
            throws IOException, ObjectStateException {
        // use parser to get xref stream object.
        CrossReferenceStream crossReferenceStream = (CrossReferenceStream) getPObject(byteBuffer, offset).getObject();
        crossReferenceStream.initialize();
        crossReferenceStream.setXrefStartPos(offset - XREF_MARKER.length);
        return crossReferenceStream;
    }

which makes sense for uncompressed xrefs starting with "xref", but this method is (probably?) only called for compressed xrefs like

35 0 obj
<< /Type /XRef
   /Length 134
   /Filter /FlateDecode
   /Size 36
   /W [1 2 2]
   /Root 33 0 R
   /Info 32 0 R
>>
stream
x<9C>[...]
endstream

where the starting position is the exact one indicated by startxref. This causes problems with incremental updates where the new trailer references the previous one with a wrong position and the pdf is therefore corrupted.
This change was made for GH-291, so I guess there was a reason for it though.

pcorless commented 9 months ago

Thanks for finding this. As you noticed the issue only shows up after a second incremental save. This is an error in the code as xref streams offset is set by the dictionary value. The offset - XREF_MARKER.length is a throwback parsing an xref table where the parser has already moved passed XREF and needs to be adjusted.