[UFO4] support cmap Unicode Variation Sequences

moyogo commented 5 years ago

See various comments in #77.

In particular https://github.com/unified-font-object/ufo-spec/issues/77#issuecomment-452633570:

The UVS data can be represented by a sequence of (unicodeValue, variationSelector, glyphName) tuples, where glyphName is optional. No glyph name means: this is the default variation, and the cmap should be used to find the glyph name for this code point.

justvanrossum commented 5 years ago

I see two ways of storing the UVS data:

As a nested structure, a dict at the top level, mapping variationSelector keys to dicts, that map unicodeValue keys to glyphName strings.
A two-dimensional table of rows with three fields each.

Option 1 can be stored in plist format, with the caveat that we need to convert unicode value keys to (hex) strings, as plist dict keys must be strings. The nested data structure closely resembles the internal structure of the OpenType format 14 cmap subtable.

Option 2. could be stored as a tab-separated text file, with the caveat that care has to be taken to respect the "no restrictions in glyph names" UFO policy. The lines in the file represent the Variation Sequences quite literally: 0030 FE00 zero.slash.

Option 1 is more machine-friendly, option 2 is more human-friendly.

justvanrossum commented 5 years ago

Option 1 with just one sequence:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
    <key>FE00</key>
    <dict>
      <key>0030</key>
      <string>zero.slash</string>
    </dict>
  </dict>
</plist>

Option 2 with just one sequence:

0030 FE00 zero.slash

justvanrossum commented 5 years ago

Storing UVS could be combined with the "regular" character mapping, by using an optional third column for the variation selector:

0030 zero
0030 zero.slash FE00

Or maybe we should consider using (a dialect of) csv:

0030;zero;
0030;zero.slash;FE00

benkiel commented 4 years ago

@khaledhosny do you have any opinions on which option that @justvanrossum proposed would be better to work with (and any unseen gottchas that may be missed in them)?

khaledhosny commented 4 years ago

I don’t have a deep knowledge of the matter, so whatever works with the tools that consume this is fine for me.

benkiel commented 4 years ago

From twitter:

In order to deal with default vs non-default UVSes, which is important for IVSes, I suggest something along the lines of the following (excerpt from the Adobe-Japan1 IVD collection):
8FBB E0100;cid3056
8FBB E0101;cid8267
Which UVS is default depends on which glyph is encoded.

JIS90-savvy Japanese fonts encode CID+3056 from U+8FBB 辻, meaning <8FBB E0100> 辻󠄀 is the default UVS. JIS2004-savvy ones encode CID+8267 from U+8FBB 辻, meaning <8FBB E0101> 辻󠄁 is the default UVS. The other, of course, is non-default, and requires a UVS to display properly.

And, to be clear, both UVSes should be specified so long as the font includes both glyphs, and both UVSes should be present and accounted for in the Format 14 'cmap' subtable.

Which UVS is default needs to be determined at compile time, because interaction with the Format 12 subtable is required to ascertain which glyph that corresponds to a UVS is encoded, and therefore the default one.

If you are looking for an extreme test case, check out the latest version of “IVS Test,” which I deployed a little over a year ago, and whose Format 14 'cmap' subtable includes nearly 40 million UVSes: https://github.com/adobe-fonts/ivs-test

benkiel commented 4 years ago

Which, by my reading, means that the spec needs to state that the tool making the font needs to decide which UVS is the default, all the designer can do is to specify the UVS for the cmap.

I'm leaning towards option 2, as it seems the easiest for editing this data (yes, spreadsheets)

unified-font-object / ufo-spec

[UFO4] support cmap Unicode Variation Sequences #79