veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
270 stars 48 forks source link

Incremental updates #813

Closed a20god closed 5 years ago

a20god commented 7 years ago

Here is a document which uses LZWDecode in the initial version. An incremental update replaces that object with one that does not use LZWDecode: update-1.pdf

According to veraPDF 1.7.22 this document conforms to ISO 19005-2:2011. The initial version of the stream (using LZWDecode) is apparently not checked by veraPDF. Note that 6.1.4 does not exempt the non-compliant version of the stream.

Yes, it's not referenced by the latest version of the xref table, but there are situations where a PDF reader must access older revisions of the document, for instance, when validating LTV signatures or when viewing the document "as signed". Are those older revisions of the document really allowed to violate ISO 19005 requirements? For instance, suppose LZWDecode is used for a DSS entry (PAdES or PDF 2.0) and DSS is later replaced by a version which does not reference streams that use LZWDecode. Though the latest version of the document may not use LZWDecode, validating the signature before the first DSS requires reading LZWDecode-encoded streams.

a20god commented 7 years ago

I guess we need a clarification on what objects are to be checked for violations of ISO 19005. According to the current wording of ISO 32000-1:2008

(1) exclude objects not listed in any xref table or xref stream (6.1.4)

(2) exclude objects in dictionaries in dictionaries in Resource dictionaries which are not referenced by name from any content stream (6.2.2)

Apparently, the Validation TWG added another rule:

(3) exclude objects not reachable by recursively walking the tree of objects starting at the trailer / xref stream.

There might be also the following rule, but it is not explicitly stated anywhere:

(4) exclude objects that a conforming reader shall ignore.

There are still open questions like the one in this issue. Also, does referencing an object via a dictionary key not defined in ISO 32000-1:2008 exclude it from rule (3)? Rule (4) might require that behavior due to 5.5:

Features described in PDF specifications that are not explicitly described in ISO 32000-1 shall be ignored by conforming readers.

This would requiring an exact model of PDF documents for walking the objects and would require knowledge of all PDF specifications as features not described in PDF specifications are not to be ignored.

Apparently, veraPDF just walks the objects:

walk-1.pdf

bdoubrov commented 7 years ago

First of all, the objects we check are defined by our validation model: http://docs.verapdf.org/validation/rules/

In particular, we indeed walk through all indirect objects (and their content) only if they are referenced from an xref table.

... but there are situations where a PDF reader must access older revisions of the document, for instance, when validating LTV signatures or when viewing the document "as signed"

Nice catch. But the validity of digital signatures is outside the scope of PDF/A.

a20god commented 7 years ago

What about the "view as signed" feature? I think PDF documents should be verified for each version (initial version or incremental update) that adds a digital signature in addition to the latest version. Note that different versions might claim conformance to different levels of PDF/A (or no conformance at all).

a20god commented 7 years ago

You wrote:

In particular, we indeed walk through all indirect objects (and their content) only if they are referenced from an xref table.

But obviously you omit the first version of object 5 of update-1.pdf (the one using LZWDecode). That object is referenced from an xref table. The wording in ISO 19005-2:2011 doesn't exempt objects referenced by xref table entries later overridden by newer xref tables.

a20god commented 7 years ago

If everything that is to be ignored by a conforming reader shall not be validated, why does veraPDF 1.7.37 say that this document is not compliant?

6.1.13-fail-3.pdf

According to 6.1.5 of ISO 19005-2:2011, the document information dictionary shall be ignored by conforming readers.

a20god commented 7 years ago

More things to be ignored by a conforming reader: https://github.com/veraPDF/veraPDF-library/issues/844