Relation between `xmpMM:DocumentID` and document ID

pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification

https://pdf-issues.pdfa.org/

67 stars 2 forks source link

Relation between `xmpMM:DocumentID` and document ID #402

Closed seehuhn closed 6 months ago

seehuhn commented 7 months ago

The "minimal PDF file" in appendix H.2 uses the xmpMM:DocumentID and xmpMM:InstanceID properties in its XMP metadata stream, and explains that these properties are a "unique GUID of document" and a "GUID changed for each save", respectively. The purpose of these fields seems very similar to the two elements of the ID array in the file trailer dictionary, as introduced in Section 14.4 (File identifiers).

It would be nice if the PDF spec explained the relation between these two pairs of identifiers: Are writers mean to generate two sets of independent identifiers for each document, or can/should/shall the XMP identifiers be somehow derived from the PDF file identifiers?

Also, are the XMP identifiers required or optional? (If optional, maybe don't show them in the "minimal file" example?)

petervwyatt commented 7 months ago

A few notes:

Annex H is very old and was not maintained for PDF 2.0 - there are probably some outdated and deprecated features being used
XMP is only ever metadata, nothing more. And metadata is always optional for "general-purpose PDF" - but it is required for ISO subsets such as PDF/A, PDF/UA and PDF/X as per their specific standards. The trailer ID entry is "real" PDF data and used with encryption (see Table 15).

seehuhn commented 7 months ago

Understood. (But note that the XMP metadata stream was not shown in the examples in the PDF 1.7 spec. It seems to have been added for the 2.0 spec.)

seehuhn commented 7 months ago

I think the best solution may be to simply remove the following lines from all XMP examples:

<rdf:Description rdf:about="" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>… unique GUID of document …</xmpMM:DocumentID>
<xmpMM:InstanceID>… GUID changed for each save …</xmpMM:InstanceID>
</rdf:Description>

The PDF spec seems like an odd place to explain the xmpMM:DocumentID and xmpMM:InstanceID properties, and there seems to be little benefit in showing these entries in the examples at all.

petervwyatt commented 7 months ago

I agree. ISO 32K-2 doesn't need to spell out anything to do with the internals of XMP for "general PDF" - that's the job of XMP spec or the PDF ISO subsets where lots of specific things are required.

lrosenthol commented 6 months ago

@petervwyatt What is the proposed changed here?

petervwyatt commented 6 months ago

In Annex H, remove all the XMP gory micro-details (since that is the job of the XMP spec) and just leave block comments of what the XMP needs to represent - and NOT explain things like which xmpMM things to be preserved or updated. Search for "xmpMM:" to see the 2 examples in Annex H.

petervwyatt commented 6 months ago

PDF TWG agree

bdoubrov commented 5 months ago

PDF/A TWG doesn't see any immediate need for any notes on how to align XMP-based ID's with trailer IDs. The use of this data is very different in various implementations.