Open seehuhn opened 2 weeks ago
In my opinion yes, they are not required and do normally not make sense.
Some additional thoughts:
I agree that CMapName
and CIDSystemInfo
are not useful for ToUnicode CMaps.
Even if it turns out that the corresponding fields are not required in ToUnicode CMap stream dictionaries, probably Type
should be required?
The only example of a ToUnicode CMap in the spec (Section 9.10.3, Example 2) does include the fields in question:
16 0 obj
<<
/Type /CMap
/CMapName /Adobe-Identity-UCS2
/CIDSystemInfo << /Registry (Adobe) /Ordering (UCS2) /Supplement 0 >>
/Length 433
>>
stream
...
endstream
(As mentioned in #344, I suspect that the CIDSystemInfo
in the example may be wrong, though.)
Rewording as follows may help distinguish between required keys (which are always required!) and the use of "pertinent":
In addition to the required entries, the only pertinent entry in the CMap stream dictionary ...
So clear that "pertinent" is not attempting to dismiss the required-ness of the other entries.
But in any of the PDFs with a ToUnicode CMap that I was just looking at there is none of these entries. Attached is a PASS file taken from the veraPDF testsuite. veraPDF test suite 6-2-10-7-t01-pass-a.pdf Or am I missing something?
Inspired by @DietrichSeggern's comment I checked the PDF files on my laptop: the files contain a total of 60477 ToUnicode CMaps. Here is how often each key in the stream dicts occurs:
So only 30 out of 60477 ToUnicode maps I inspected included the fields in question.
I was just following the bouncing ball of references... clearly not reflecting reality!
I guess the ToUnicode definition does says it is "A stream containing a CMap file..." and doesn't reference the CMap stream dictionary definition in Table 118, but its hard to tell if this legacy language and an explicit nuanced sentence. This is also what the 1st bullet near the end of 9.10.1 implies. The text is generally confusing CMap (the data syntax) with CMap (the PDF stream object).
So maybe in this specific case "pertinent" does mean the only key that you can expect to find in a ToUnicode stream dictionary is UseCMap since it is not a "CMap stream" but a "stream that is a (slightly tweaked) CMap".
If that is true, then the consistent method to correct this would be to add a new Table titled "additional entries in a ToUnicode stream dictionary" and list just UseCMap. This is how all other streams in 32K are defined that have special keys beyond the standard set for streams. That way it would be explicitly unambiguous. But maybe the other CMap stream dictionary keys (like Type) are optional... I really don't know so let's also ask @lrosenthol to do some PDF archeology since extant data doesn't always get things correct.
Section 9.10.3 of the PDF-2.0 spec states
Table 118 lists the following entries as required:
Type
,CMapName
,CIDSystemInfo
. Does the above sentence mean that these entries are not required for ToUnicode CMaps? It would be great if the spec could clarify what the meaning of "only pertinent entry" is in this context.