GDL::make_name() odditites

bobh0303 commented 7 years ago

I'm thinking about a revision to the make_name() routine in GDL.pm, but as I study the code there are a number of oddities I'm wondering if we should clean up. Thoughts welcomed.

(I should say up front that, for backwards compatibility, any revision that generates different GDL identifiers will be disabled by default and require an option parameter to enable it.)

One major concern is that the routine does not attempt to separate out (and process individually) ligature components of the glyph names. For example, a while glyph name of:

uni1234abcd generates a GDL identifier g1234_abcd and
u12345 generates g12345

if we put those two into a ligature we get, bewilderingly:

uni1234abcd_u12345 generates g1234_abcd_1234_u5 and
u12345_uni1234abcd generates g12345_ni1234abcd

(rather than something more expected like g1234_abcd_12345 and g12345_1234_abcd respectively)

So perhaps my first question is: shouldn't make_name() be processing such ligature components independently?

Next question is whether we really want to lowercase USVs in names? Currently, for example,

uni1234ABCD generates g1234_abcd and
uABCDE generates gabcde

Personally I find the uppercase USVs more readable.

Happy for this to be a brainstorming session...

mhosken commented 7 years ago

From what you say, it's behaving as I intended it to behave. But I am open to a discussion on that.

According to the AGL a ligature name may be uxxxx_uxxxxuxxxx... or unixxxxyyyyzzzz... but not both. Hence uxxxx_uniyyyyzzzz is wrong and so I treat it as such. But if you want to change it such that in effect we treat uni as u, then that's fine by me. These cases shouldn't be occurring anyway. OK I admit u12345_uni1234abcd should have output g12345_uni1234abcd.

As to casing. I prefer lowercase, it's less noisy in a glyph name. Perhaps we need a switch for that too? I would like people to be able to get names they want to work with.

bobh0303 commented 7 years ago

According to the AGL a ligature name may be uxxxx_uxxxxuxxxx... or unixxxxyyyyzzzz... but not both. Hence uxxxx_uniyyyyzzzz is wrong and so I treat it as such.

Actually what is wrong -- or at least not recommended -- about this case is using 'u' notation for BMP characters. This is mentioned in AGL Specification (Section 6) where it says:

... it is recommended to specify names by using the "uni" prefix for characters in the Basic Multilingual Plane (BMP), and the shorter "u" prefix for characters in the 16 Supplemental Planes ... Why is the prefix "u" not yet recommended for glyphs that are encoded in Unicode's BMP? The prefix "u" is not supported by Acrobat Versions 4 and 5. It became supported by Acrobat Version 6 and later, which is also when support for Unicode characters outside the BMP (Basic Multilingual Plane) was introduced. AGL names and glyph names that use the prefix "uni," along with the "." and "_" parsing rules, are already supported by Acrobat Versions 4 and 5.

But as for mixing 'u' and 'uni' notations in a ligature, this appears to be perfectly acceptable and in fact AGL Specification (Section 3) includes this example:

The name "Lcommaaccent_uni20AC0308_u1040C.alternate" has three components, which are "Lcommaaccent," "uni20AC0308," and "u1040C." It is mapped to the string U+013B U+20AC U+0308 U+1040C.

On a tangent: In reading the spec, I realize my original examples are flawed in that they have lower case hex digits in the glyph name, while the spec requires upper case only. In fact it gives this example:

The name "uni20ac" has a single component, which is mapped to an empty string (note the lowercase "a" and "c").

This also means our code should be tightened up to recognize only uppercase hex digits in the glyph name.

silnrsi / font-ttf-scripts

GDL::make_name() odditites #16