pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
63 stars 2 forks source link

contradicting information about the encoding of TrueType fonts #316

Open seehuhn opened 1 year ago

seehuhn commented 1 year ago

Table 112 (Entries in an encoding dictionary) in section 9.6.5.1 states about the Differences entry of an encoding dictionary, that the entry "should not be used with TrueType fonts".

Section 9.6.5.4 (Encodings for TrueType fonts) has a section beginning with "The following paragraphs describe the treatment of TrueType font encodings beginning with PDF 1.3." In this section, it is described how a table that maps from character codes to glyph names is constructed. As part of this process, the description states "Any entries in the Differences array shall be used to update the table."

These two parts of the PDF spec seem to contradict each other, since the table states not to use the differences array, and the later section indicates the differences array can be used to describe the encoding.

The text should be clarified to remove this contradiction. Maybe the table is meant to say "should not be used with TrueType fonts for PDF versions before PDF 1.3"? Or maybe the text is section 9.6.5.4 should be updated to describe how to describe the encoding without using the differences array?

Use of differences arrays seems to be supported in practice. The attached PDF file includes a TrueType font which uses a differences array, and the text displays correctly in Adobe Acrobat Reader, in the Preview app on MacOS, and in the PDF viewer built into Google Chrome (also on MacOS).

truetype.pdf

petervwyatt commented 1 year ago

From an editorial (non-technical) PoV this recommendation ("should") and requirement ("shall") are not conflicting when read with an understanding of "ISO-ese": Differences is not recommended ("should") for TrueType but when Differences is present for TrueType then it must always ("shall") be used. Practically that means Differences cannot be ignored on the assumed few times it will be present for TrueType fonts.

Note: I have not addressed the technical logic behind why Differences is not recommended for TrueType font.

seehuhn commented 1 year ago

Thank you for your quick response. I did indeed not fully appreciate the the difference between "shall" and "should".

Even if the text of the specification is correct as is, it might still make sense to add some guidance for application writers about how TrueType fonts should be embedded by new software. I am trying to generate PDF files which embed TrueType fonts (like the one attached to the issue, above). If Differences arrays were ok to use, it would be possible to select different sets of glyphs from one larger font program in different font dictionaries. If the encoding for this use case in practice needs to be specified in the TrueType "cmap" table, this would require to embed a separate font program for each font dict.

lrosenthol commented 1 year ago

If Differences arrays were ok to use, it would be possible to select different sets of glyphs from one larger font program in different font dictionaries.

It was never envisioned that one could do that - and for good reason, it makes downstream PDF modification extremely difficult (or more difficult).

seehuhn commented 1 year ago

It was never envisioned that one could do that - and for good reason, it makes downstream PDF modification extremely difficult (or more difficult).

But this approach is explicitly mentioned as being possible in section 9.6.5.1: "Some character sets consist of more than 256 characters, including ligatures, accented characters, and other symbols required for high-quality typography or non-Latin writing systems. Different encodings may select different subsets of the same character set."

seehuhn commented 1 year ago

Here are some thoughts about what could be done to make the text of the spec more consistent:

There is also a potential contradiction between the rules on page 326, and the text underneath table 113. The text on page 326 gives the rules for the case when "the font has a named Encoding entry of either MacRomanEncoding or WinAnsiEncoding, or if the font descriptor’s Nonsymbolic flag [...] is set". On the following page, after table 113 the text gives rules for when "the font has no Encoding entry, or the font descriptor’s Symbolic flag is set (in which case the Encoding entry is ignored)". This leaves us with the following situation:

It is not clear to me which set of rules applies for the first and last case in this list. Maybe this could be clarified in the spec?

seehuhn commented 1 year ago

I looked at older versions of the spec. In the PDF 1.4 spec, the description does not yet make use of the symbolic/non-symbolic flags. There is just says (in many words): if an /Encoding entry is given, it is used. Otherwise the “cmap” subtable with platform ID 1 and encoding 0 will be used. At the time they also still allowed MacExpertEncoding, which is no longer allowed in the current spec. Thus, if the intention was to be backwards compatible, the two problematic cases above would be resolved as follows: