pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
63 stars 2 forks source link

The semantics for use of Table 16 ObjStm /Extends key are unclear / insufficient #148

Open petervwyatt opened 2 years ago

petervwyatt commented 2 years ago

The semantics for use of Table 16 object stream dictionary Extends key are insufficient as to what a PDF reader is expected to do when encountering this key when parsing, or how a PDF writer/modifier software might need to do to maintain, add or remove this key.

This was discussed in the ISO TC 171 SC 2 WG 8 "Securing PDF" discussion group:

The Extends key is effectively a hint for a PDF reader that a "collection" of object streams (as a "directed acyclic graph") has some unspecified relationship and that caching the collection of related object streams may provide some benefit. Obviously any decision to cache/pre-process or not is entirely up to each PDF processor, as the collection may be large, and/or the unspecified relationship may not be relevant to the PDF processor in question. The term "collection" here has nothing to do with clause 12.3.5 "Collections".

General suggestions for improvement:

Any other suggestions?

petervwyatt commented 1 year ago

PDF TWG agree principle - wordsmith and review next time.

petervwyatt commented 7 months ago

Table 16, Extends key description and following example and NOTES proposed rewording (all mention of "collections" is removed and "hint" is mentioned):

(Optional) An indirect reference to another object stream, which hints to a PDF processor that the referenced object stream is considered an extension by way of having unspecified common characteristics. A given set of object streams linked via Extends shall form a directed acyclic graph.

EXAMPLE 1 It can be useful to store objects having common characteristics together, such as "fonts on page 1" or "Comments for draft #3."

NOTE 4 To avoid a degradation of performance that can occur when downloading and decompressing a large object stream to access a single compressed object, the number of objects in an individual object stream needs to be limited. Linking multiple object streams via the Extends entry in the object stream dictionaries provides a hint to PDF processors for improving performance.

NOTE 5 Extends can also be used when updating to include new objects. The new objects can be stored in a separate object stream and linked via Extends, rather than modifying the original object stream, which could entail duplicating much of the stream data. This is particularly important when adding an update section to a document.

mkl-public commented 7 months ago

A question on the side: Does anyone know a PDF creator that makes use of the Extends entry?

I ask because I'm not aware to ever have seen such an entry in real life documents. But maybe I've simply overlooked that entry here or there...

petervwyatt commented 6 months ago

I have 1,000s of PDF that have /Extends entries - many created by various Adobe technologies.

petervwyatt commented 3 months ago

PDF TWG would like to reword...

petervwyatt commented 3 months ago

@lrosenthol - could you please propose improved wording?