pdf-association / pdf-issues

Industry-based resolutions for issues and errata reported against any PDF-related specification
https://pdf-issues.pdfa.org/
66 stars 2 forks source link

CapHeight required only for fonts with Latin characters is ambiguous #274

Open bdoubrov opened 1 year ago

bdoubrov commented 1 year ago

PDF 2.0 specifies CapHeight entry in FontDescriptor dictionary (Table 120) to be "Required for fonts that have Latin characters, except for Type 3 fonts".

There are two issues here: a font subset may include only lowercase Latin characters, and this parameter would not have sense for it. Second, the term "Latin charater" is not defined in the spec. And I'm not sure if it would be correct to define it only as an ASCII character in the range [a-zA-Z].

As a potential resolution, the text might be modified so say: "Required for fonts that have Latin characters in the range A-Z, except for Type 3 fonts".

MatthiasValvekens commented 1 year ago

Out of curiosity: is this parameter still used/useful in practice? If yes, what is it used for?

Depending on the answer to that question, we could also consider deprecating it or making it optional if we're messing with the requirement scope anyway. If, on the other hand, we're not able to come up with a clear answer to the above, I'm not sure it's a good idea to try and rewrite this requirement.

EDIT: There may be other font metrics for which this could be a meaningful exercise.

petervwyatt commented 1 year ago

Not unsurprisingly most font descriptor metrics have a direct relationship to values inside font programs: https://learn.microsoft.com/en-us/typography/opentype/spec/os2#scapheight

For non-embedded fonts that don't use uppercase Latin chars, it may still make sense to still specify as font matching or synthesis algorithms may still want to use it. See https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6bsln.html, "Example: Format 1 Baseline Table".

Also "Latin" extends beyond just A-Z - consider Æ.

I agree that the required statement as written is vague as to what "Latin" means - could we refer to Annex D.2 "Latin character set and encodings" to improve preciseness? (This has more chars than is strictly needed but I think would make things validatable)

petervwyatt commented 1 year ago

Latin == "ISO Latin 1" or Annex D.2 "Latin character set and encodings"???

@lrosenthol - note below table D.2 mentions Adobe Latin / Mac OS Latin. Please research...

lrosenthol commented 9 months ago

I have always considered it to mean ISO Latin 1, but will investigate.

And as @petervwyatt mentioned, CapHeight is used in various implementations for things like font matching, etc. So just because you don't have a Cap in your subset, you still need the value (as read from the original font file)

car222222 commented 9 months ago

Just to note that CapHeight may well also be used in fonts for Cyrillic characters, and maybe some other scripts.