pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
66 stars 2 forks source link

7.5.4 (Cross-reference table): is the linked list of reusable free entries still relevant? #465

Open stechio opened 2 months ago

stechio commented 2 months ago

Modern PDF usage seems to make the linked list of reusable free entries almost obsolete, as compressible objects (i.e., any indirect object other than streams (and Length entry of object streams) — see subclause 7.5.7 (Object streams)) don't allow recycled entries (generation number must be zero) and, moreover, the de-facto reference implementation (Acrobat 6.0 and later) does "not use the free list to recycle object numbers; new objects are assigned new numbers" (as stated by implementation note 16 in section H.3 of PDF Reference 1.7 by Adobe Inc.).

Because of such considerations, I wonder whether it still makes sense to keep the linked list of reusable free entries alive when writing PDF files, or all the entries freed ~by deleted objects~ [from objects deleted during an editing session]* can be simply marked as non-reusable (generation number 65,535). Is the linked list worth the burden of its maintenance?

EDIT

[*] clarification

mkl-public commented 2 months ago

I would not mark them as non-reusable; why should one forbid re-use? But you do not need to keep a linked list of multiple free objects, you can instead have multiple linked lists of exactly one free object each, so you don't need to store the order of free objects.

stechio commented 2 months ago

@mkl-public:

I would not mark them as non-reusable; why should one forbid re-use?

Why not? (Acrobat ignores reusable entries, anyway...) :grimacing: I'm genuinely asking for compelling reasons to keep the linked list alive, considering the arguments of my initial comment.

But you do not need to keep a linked list of multiple free objects, you can instead have multiple linked lists of exactly one free object each, so you don't need to store the order of free objects.

I'm not sure what you mean... if it is about keeping each reusable free entry on its own, each one linking back to the head (first entry in the table (object number 0)), then I see two major flaws:

As the linked list is one and only one, the only alternative legal way to manage the free entries list is to exclude them using the second mechanism, that is non-reusable entries ("free entries that link back to object number 0 and have a generation number of 65,535") — here we go again :sweat_smile:

mkl-public commented 2 months ago

I would not mark them as non-reusable; why should one forbid re-use?

Why not? (Acrobat ignores reusable entries, anyway...)

Well, in case of signed PDFs changing the cross references like that may be seen as disallowed change as it clearly is not necessary for any of the allowed changes. So in such use cases I would apply as few changes as possible.

But you do not need to keep a linked list of multiple free objects, you can instead have multiple linked lists of exactly one free object each, so you don't need to store the order of free objects.

I'm not sure what you mean...

Ehm... you're right, The approach with multiple, single-element lists actually requires a generation number of 65,535 which I tried to argue against. ;) So you may forget about this option.

So if your question applies to arbitrary cases, including incremental updates, I'd recommend applying no unnecessary changes.

And if your question actually focuses on cases in which you do a full save, you also could arbitrarily renumber the objects, so you can also do arbitrary changes to the cross references, including marking them as non-reusable.

petervwyatt commented 2 months ago

A few general comments:

stechio commented 2 months ago

@mkl-public:

I would not mark them as non-reusable; why should one forbid re-use?

Why not? (Acrobat ignores reusable entries, anyway...)

Well, in case of signed PDFs changing the cross references like that may be seen as disallowed change as it clearly is not necessary for any of the allowed changes. So in such use cases I would apply as few changes as possible.

That's a good point and I totally agree with you: I didn't express myself properly, as my focus was actually on the entries freed during an editing session, NOT on already-existing free entries — that is, to stop the maintenance of existing linked lists (avoiding to add new freed entries), NOT to destroy them.

mkl-public commented 2 months ago

That's a good point and I totally agree with you: I didn't express myself properly, as my focus was actually on the entries freed during an editing session, NOT on already-existing free entries — that is, to stop the maintenance of existing linked lists (avoiding to add new freed entries), NOT to destroy them.

In that case you as PDF processor can do as you prefer.