pdf-association / arlington-pdf-model

A vendor- and implementation-independent specification-derived, machine-readable model of PDF.
Apache License 2.0
77 stars 6 forks source link

ProcSet in PDF 1.4-PDF 1.7: deprecated vs obsolete #46

Closed bdoubrov closed 1 year ago

bdoubrov commented 1 year ago

Even if ProcSet is indeed marked as obsolete starting from PDF 1.4, the PDF 1.7 (section 14.2) still suggests that PDF writers should continue adding it to Resource dictionaries:

Beginning with PDF 1.4, this feature is considered obsolete. For compatibility with existing conforming readers, conforming writers should continue to specify procedure sets (preferably, all of those listed in Table 314 unless it is known that fewer are needed). However, conforming readers should not depend on the correctness of this information.

This wording does not correspond to the definition of deprecated feature. So, I would suggest changing DeprecatedIn from 1.4 to 2.0 in ArrayOfNamesForProcSet.tsv

petervwyatt commented 1 year ago

ISO 32000-1:2208 says "These procedure sets shall be used only when the content stream is printed to a PostScript output device. The names identify PostScript procedure sets that shall be sent to the device to interpret the PDF operators in the content stream. "

"this feature is considered obselete" is also a stronger statement than being deprecated. And (AFAICT) ISO has never obsoleted any PDF feature, just marked them as deprecated. ISO 32K-2 warns (T&D 3.15) that deprecation is a precursor to "removed completely, in a later version of ISO 32000".

So PS ProcSets cannot be used when any PDF transparency is involved or if later content-stream relevant PDF features not supported by PostScript are used (and Arlington should therefore warn), as PS uses purely opaque imaging with reduced support for imaging formats (e.g. no J2K, no JBIG2), etc. This is covered by the statement "... existing conforming readers ..." which, at the time of writing, meant PDF 1.3 opaque-only renderers with limited image handling (compared to PDF 1.4).

So if anything the Arlington definition needs to somehow account for a file declaring itself to be PDF 1.4-PDF 2.0 to NOT contain transparency and NOT contain later image formats not supported by PS, but that is a runtime behaviour against a specific file, not a spec thing. So setting deprecated to 1.4 is the closest solution that reflects the underlying intention of the wording.

And given that this is just a warning message produced by Arlington PoCs (not an error), then I respectfully disagree with this proposed change. It's the same as if a fully obsoleted PDF 1.0 feature is encountered (such as XObject XUID key or a CalCMYK color space) - Arlington PoCs will warn.

PS. I do know 85%++ of all PDFs set ProcSets when they shouldn't.

bdoubrov commented 1 year ago

There are two clearly contradicting instructions for conforming readers and conforming writers. I fully agree with the above statements in case of conforming readers: the ProcSet is indeed obsolete and shall not be used for any processing by the readers as of PDF 1.4.

However, the conforming writers are still supposed to add ProcSets to the files up to and including PDF 1.7: "For compatibility with existing conforming readers, conforming writers should continue to specify procedure sets". (Last paragraph in ISO 32000-1, 14.2). I think this is exactly the reason why we find 85% of files containing ProcSets.

This is different from the notion of a deprecated feature in PDF 2.0, which is defined as "part of ISO 32000 that should not be written into a PDF 2.0 document, and should be ignored by a PDF processor".

Anyway, I would at least suggest that the documentation on the Arlington model defines somewhere what Deprecated property means just to make sure that it is not confused with the definition of the deprecated content in PDF 2.0.

petervwyatt commented 1 year ago

I would strongly state that PDF 1.7 is clearly in error by stating "For compatibility with existing conforming readers..." as this statement was originally written assuming PostScript and opaque-only imaging. The "conforming" part appears to have been blindly added during the fast track process without any technical consideration. There is NO way a PostScript-based RIP can be a "conforming reader" for PDF 1.7!!

It would have been more correctly stated as "_For compatibility with legacy PostScript-based readers limited to the opaque imaging model..." or something along those lines.

ISO 32000-2:2020 clause 14.2 also states "This feature has been deprecated since PDF 1.4." which is far more accurate than ISO 32000-1:2008.