pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
66 stars 2 forks source link

Use of AFRelationship with non-associated files conflicts/is highly confusing with PDF 2.0 and PDF/A-4 #391

Closed petervwyatt closed 6 months ago

petervwyatt commented 7 months ago

Well-Tagged PDF section 8.9.2.4.10 defines the use of the AFRelationship key for non-associated files that are linked to file attachment annotations.

When a file attachment annotation (as defined in ISO 32000-2:2020, 12.5.6.15) references a file specification dictionary (as defined in ISO 32000-2:2020, 7.11.3), the file specification dictionary shall include an AFRelationship entry.

The original definition of Associated Files in PDF/A-3 (ISO 19005-3:2012) only defined AFRelationship in conjunction with AF entries, including for annotations. Subclause Annex E.8 explicitly defined the AF array entry in all annotation dictionaries, without special provisos for file attachment annotations.

PDF/A-3 Annex E was morphed across into PDF 2.0 (ISO 32000-2) with various wording changes and Table E.1 defining the AFRelationship key was appended to "Table 43 - Entries in a file specification dictionary" in PDF 2.0 with some new values also added. The description of AFRelationship specifically calls out to ISO 32000-2 clause 14.13 Associated Files:

A name value that represents the relationship between the component of this PDF document that refers to this file specification and the associated file denoted by this file specification dictionary. See 14.13, "Associated files" for more details.

ISO 32000-2 subclause 14.13.9 (what was PDF/A-3 Annex E.8) now states a "shall" requirement for using associated files with any type of annotation:

To associate files with annotations, the annotation dictionary shall contain an AF entry which represents the associated files for that annotation.

ISO 19005-:2020 (PDF/A-4) is written against ISO 32000-2 (PDF 2.0) and states this in 6.9 Embedded Files:

A conforming interactive processor shall provide a mechanism to display the name strings from the value of the EmbeddedFiles key in the name dictionary of a conforming file. In addition, a conforming interactive processor may also choose to display information from the associated embedded file stream dictionaries or their Params dictionary.

The factual statement about Params is coming from Table 44 in ISO 32000-2:

required in the case of an embedded file stream used as an associated file

Thus WTPDF is in direct conflict with the statements in ISO 32000-2.

Possible solutions:

This latter option has ramifications for both (PDF 2.0) ISO 32000-2 and (PDF/A-4) ISO 19005-4. See also PDF/A TWG Issue #40, Errata #390 and Errata #385.

PS. For completeness of public information about AFRelationship, C2PA also define a new value using their properly registered 2nd class name for AFRelationship: /C2PA_Manifest - see https://c2pa.org/specifications/specifications/1.3/specs/C2PA_Specification.html#_embedding_manifests_into_assets then search for "AFRelationship"

petervwyatt commented 7 months ago

@faceless2 - as discussed last night in the PDF/A TWG. Please add your thoughts!

petervwyatt commented 7 months ago

On further detailed reading of C2PA, realised the same issue as for WTPDF is there also! 😒

Quoting https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_document_level_manifests:

When adding a C2PA Manifest to the entire PDF, the document catalog dictionary shall contain an AF entry whose value is (an indirect reference to) the embedded file specification containing the active manifest. It shall also be referenced (via indirect object) either from the EmbeddedFiles NameTree (/Catalog/Names/EmbeddedFiles) or from a FileAttachment annotation. The annotation approach shall be used when adding a C2PA Manifest Store to a PDF that already has an existing PDF certifying signature in order to avoid invalidating its DocMDP restrictions.

faceless2 commented 7 months ago

I'll note again for the record that we had completely missed the subtlety of this requirement!

I think loosening the restrictions makes a lot of sense.

  1. For PDF/A-3 it makes no practical difference - AFRelationship may be allowed on non-AF files, but all files have to be AF anyway.

  2. For PDF/A-4 it's being discussed by the TWG

  3. In general, it's an odd rule. PDF is filled with requirements of the form "If N is a thing of M, then the following keys must be ..." - where M is the subject, N is the object (eg "if the stream is an XObject then /Subtype must be ...") . But I can't think of any other examples of "if this key is ..., then the object N must be thing of some M", which is what we're saying here. It feels backwards, and unnecessarily strict.

I'm prepared to be schooled (again :smile:) by Peter telling me this constructions is used elsewhere in the spec, but if it's not unique I'm confident it's uncommon. I don't think anything would be lost by using the first form of this requirement. "If the file is an Associated File, then the AFRelationship key must be present".

petervwyatt commented 7 months ago

I'd be certain there is something somewhere but, as you say, it is uncommon. I think the issue is with the wording - there are certainly many places where we forward reference something elsewhere for all the gory details - so a lack of information does not imply a key can just be reused - especially when you consider its original constrained definition in PDF/A-3.

Having said all that I'm also not opposed to opening it up to being more flexible, but I'm not sure I understand the semantic differences implied that could then occur between a File Attachment annot with an AFRelationship entry and with or without an AF array with the same or different AFRelationship values??? What if these files were the same object (I could well imagine that happening!)? What if the files were different, but had the same AFRelationship value? How might this all be rationally presented to a poor user?

petervwyatt commented 7 months ago

To move this forward one step: is the approach to loosen AFRelationship to apply to any file specification dictionary for "general purpose PDF" supported by the PDF TWG?

Note this includes both non-embedded files (URLs) as well as embedded file streams.

petervwyatt commented 7 months ago

PDF TWG disagree that AFRelationship can be used with non-associated files (i.e. those in the AF arrays). Propose better wording in ISO 32000-2 to make this clearer. WTPDF, PDF/UA-2 and C2PA handled elsewhere or with separate errata.

petervwyatt commented 7 months ago

Proposed solution rewording of Table 43:

A name value that represents the relationship between the component of this PDF document that refers to this file specification (via an AF array) and the associated file denoted by this file specification dictionary. See 14.13, "Associated files" for more details.

petervwyatt commented 6 months ago

PDF TWG agree