Better anchor definition

moyogo commented 8 years ago

For ligature anchors, many UFOs designed with one authoring tool don’t work with other authoring tools as they have different ways of storing this information. Some authoring tools expect specific suffixes (like _1, _2,... or #1, #2,...) while others expect specific prefixes. It would be better to standardize this, either in the name or preferably with an attribute (for example ligatureIndex).

/cc @graphicore @khaledhosny @jamesgk

moyogo commented 5 years ago

@benkiel Indeed, Glyphs.app and ufo2ft follow specific rules that get translated to AFDKO feature syntax, which then gets translated into GPOS lookups. Adobe’s markFeatureWriter.py follows similar rules but isn’t as complete. In short, an anchor will translate to a mark-to-mark lookup only if it is both a base anchor and a mark anchor in glyphs that also have mark anchors, and an anchor will translate to a mark-to-liga lookup only if there are matching numbered anchors in ligature glyphs. FontForge follows rules much closer to the model of GPOS lookups.

@typoman The AFDKO feature syntax is higher level than GPOS lookups so you won’t see exactly the same structure. In short the _top is only defined once in the AFDKO example you give, but it will be duplicated, one in the mark lookup and another one in the mkmk lookup in GPOS. So you could very well have "_top", "_topmkmk" (at the same coordinates), and "topmkmk" in circumflexcomb that will produce the same GPOS structure as having "_top" and "top".

Glyphs.app and ufo2ft make the anchor type implicit as long as you follow their rules. FontForge makes the anchor type explicit. The UFO spec doesn’t currently mention that there is a relation between "_top" and "top".

Sorry, I’m just babbling about how different authoring tools work without providing a way forward. But we may need to either specify rules in the spec that follow or a close to what Glyphs.app and ufo2ft do or a way for the user to be more specific.

@gferreira’s skipExportAnchors as a list of anchor not to export would already help the user.

justvanrossum commented 5 years ago

I think we should reconsider adding a lib to anchors after all, given all the possible uses for anchors listed in this thread alone.

benkiel commented 5 years ago

One other thought, if we allowed a mark to have more than one type, it would accommodate what Glyphs/UFO does and also what FontForge does and allows for anchor re-use but also more specificity.

typoman commented 4 years ago

Here I've gathered some information about how tools deal with anchors in UFO. I hope this could be useful for people who are trying to figure out how things already work in UFO tools. Some of this is just copy-paste from different places in GitHub.

CursiveAttachment

FDK fea file syntax: position cursive glyph.name <anchor x y> <anchor x y>; The location of anchors is written exactly as they’re stored in the UFO in a rounded integer type. The first value record is the entry and the second is exit. If a glyph is missing one of these anchors, its location should be written as NULL <anchor NULL>.

Glyphs app

The anchor can be defined by writing #entry or #exit on the anchor name.
MarkToBase, MarkToMark, MarkToLigature, MarkClass

ufo2ft

Its mechanism is based on the Glyphs app and in turn older mark feature writers. In ufo2ft if any of the supported features are already present in the feature file, it is not generated again. ufo2ft parses anchor.name using regular expressions and according to its results creates an object called NamedAnchor. This object is used to make all the mark related features and it has three main variables:
- isMark bool variable. Any anchor that starts with mark prefix which is typically a _. If the parent glyph have any isMark anchor then the glyph should be a mark type, otherwise it should be a base or ligature type. Examples:
- If anchor.name == _top -> isMark = True
- If anchor.name == top -> isMark = False
- key string variable. This is used to diffenetiate mark class types (e.g. top or bottom). Examples:
- if anchor.name == _top -> key = top
- if anchor.name == bottom_2 -> key = bottom
- number int variable. This is used for logical order of mark in the ligature and it can also indicate if the parent glyph is a ligature type. Examples:
- anchor.name == _top -> number = None
- anchor.name == top_2 -> number = 2
- Any glyph types (ligature, base, mark) should be defined in the GDEF part of the feature file in UFO. Otherwise, variables of NamedAnchor are used for its type definition.
- ufo2ft creates mark features data while iterating NamedAnchor objects collected from the UFO glyph anchors:
- If the NamedAnchor is not a mark e.i. isMark == False (e.g. top, top_1):
- If NamedAnchor has a number (e.g. top_1):
  
  Define MarkToLigature positioning inside the mark feature according to NamedAnchor.key and put the anchor in the order according to its number:
```
position ligature glyph.name 
          <anchor x y> mark @MC_top                                # number = 1
          ligComponent
          <anchor x y> mark mark @MC_top;                            # number = 2;

                    # MC in the class name stands for Mark Class
```
- If NamedAnchor doesn't have a number (e.g. top): Define MarkToBase positioning inside the mark feature according to its key:
```
position base glyph.name <anchor x y> mark @MC_top;
```
- If the parent glyph contains an isMark NamedAnchor (e.g. top) define the current NamedAnchor (e.g. _top) as MarkToMark positioning inside the mkmk feature:
```
position mark glyph.name <anchor x y> mark @MC_top;
```
- If NamedAnchor is a mark (e.g. _top):
- Define a MarkClass class definition using its key (e.g. top)
```
markClass glyph.name <anchor x y> @MC_top;
```
While defining mark features, this is considered:
- In the MarkToMark positionings, the lookups for different mark class types are separate. For example top and bottom mark lookups are defined separately, and the lookups get a MarkAttachmentType flag to ignore processing any other mark classes. For example lookup for MarkToMark positioning of bottom marks gets:
lookupflag MarkAttachmentType @MC_bottom;
- Script (writing system) exceptions:
- If any glyph is considered an indic script (Beng, Cham, Deva, Gujr, Guru, Knda, Mlym, Orya, Taml, Telu) then its feature is not defined inside the mark or mkmk feature. Instead according to the following criteria they go to above mark feature abvm or below mark blwm feature:
  - If name of anchor belongs to these sets:
```
  abvmAnchorNames = {"top", "topleft", "topright", "candra", "bindu", "candrabindu"}
  blwmAnchorNames = {"bottom", "bottomleft", "bottomright", "nukta"}
```
  - Else if anchor position is above or equal to the middle of UPM then it’s above mark; otherwise, it’s below.

Glyphs app

Tibetan script (tibt) mark features do not go to abvm and blwm features but just one mkmk. Its lookup also doesn’t get lookupflag MarkAttachmentType as that prohibits the attachment of marks to different anchors than the previous mark.

FDK mark feature writer

The anchors that are specific to Indian scripts must be named abvm (for Above Marks) or blwm (for Below Marks), and the Indian scripts option needs to be checked in the UI.
The MarkAttachmentType lookupflag is no longer added to MarkToMark lookups meant for the abvm and blwm features (Indian scripts).
All the combining mark glyphs must be added to an OpenType class named COMBINING_MARKS. I guess this is to avoid generating mark for glyphs which don't need to have mark positioning feature.
When designing the combining marks, often there's the need to make specific versions for uppercase and small cap glyphs, which are slightly different from the lowercase. For this reason, one might want to use a casing tag in the anchors' names (e.g. _aboveLC, _aboveUC and _aboveSC instead of just _above for all the three cases) so that the position of the anchors can be tested in FontLab. But this distinction is not necessary for the mark feature, and that is why this script allows for those casing tags to be trimmed, by setting the value of kDefaultTrimCasingTags.
If the font has ligatures and the ligatures have anchors, the ligature glyphs have to be put in OpenType classes named LIGATURES_WITH_X_COMPONENTS, where X should be replaced by a number between 2 and 9 (inclusive). Additionally, the names of the anchors used on the ligature glyphs need to have a tag (e.g. 1ST, 2ND) which is used for corresponding the anchors with the correct ligature component.

I dropped some minor details. For more details read the source of markFeatureWriter in ufo2ft or WriteFeaturesMarkFDK in python module repo of the FDK.

What’s the issue

To interpret what’s the purpose of anchor there is lots of guesswork on the anchor name and glyph data during the binary compile. Some features are generated only on the compile without getting exposed to the user (composites anchor propagation). Writing OpenType features is a user’s job, not the compiler. An authoring tool could automate some of it (like propagating composite anchors) but there shouldn't be anything left for compiler's guessing. Also for the sake of transparency adding some attributes to anchor can help to remove the guesswork and give more control to the user. My suggestion is either to have an anchor lib or the following attributes:

Anchor Definition

Explicit anchor attributes to define its definition:

There could be an anchor.type string attribute as there can’t be two types per anchor (Entry, Exit, MarkToBase, MarkToMark, MarkToLigature, MarkClass).
There could be an anchor.index for the anchor in the MarkToLigature anchor type. One might say this could be interpreted from the anchor order inside the glyph, but it’s not easy to read it for humans. Also, it should be visible for the user that logical order in RTL scripts is starting from the right. Adding any numbers to the anchor.name could also add more to the guessing and is best to be avoided. In the Glyphs app, numbers could be added to anchor names because anchors with similar names are not allowed.
If all the attributes above exist, then the class/key (e.g. top, bottom) can be written in the anchor.name and there can be multiple anchors with the same name in the glyph but with different anchor.type.

Lookup definition

Still, the compiler needs to guess how to write the flags, lookups and which feature the anchor definitions belong to. Again this can be automated by an authoring tool and saved in the UFO but it shouldn't be a compiler's guesswork. This could be achieved with data structures on the font level and anchor could reference that data. This could be one way of doing it:

anchor.flags or anchor.lookups list, so the tool explicitly writes the lookup flags instead of the compiler. The list items can be flags or the lookup name in string type. This attribute also helps to create separate lookups if needed. If anchor.lookups is added, then there should be another place to write the lookup information (e.g. font.lib) and there is no need for feature attribute since the lookup should have a list of features which it belongs to.
anchor.contextBefore and anchor.contextAfter list attributes that are used in Arabic and Indian scripts. The list items can be a glyph.name or a group name or list of glyph.name(s). FDK has a nonpublic script to write this based on the anchor name and it has limitations for writing the context.

In the end, maybe anchor.lib could be an easier solution instead of adding all these attributes. Since there is no anchor lib in UFO, personally I'm thinking that I will write the mark feature inside the data folder. I might have my own syntax for the feature file which is easier to read and diagnose but I haven't finished it yet.

schriftgestalt commented 4 years ago

I think Glyphs.app uses a naming scheme … For ligature anchors it would use _name_1, _name_2 etc, afaik, may be not fully correct though.

The ligature anchos are without underscore prefix.

Would defining glyph.lib["public.*"] keys for glyph type and ligature component count be helpful?

Glyphs.app defines several attributes for each glyph: script, category (letter, mark), subCategory (ligature, nonspacing), decomposition (list of glyphInfo objects (that each have all the above info)). ligature component count can be computed from the decomposition info (iterate it and count the number of none mark glyphs).

As explained in the issue you reference, adding lib to anchor is problematic, so please focus on solutions that don't require that.

In the issue it stats that is complicated and .ufo doesn't support it. Both problems can be solved.

Regarding anchor.type: what are the needed values for such a field? In the above comments I read: base, mark, ligature, entry, exit

sometimes anchos are only used to position components. That can be true for all five types. So a flag that says: Don't consider when generation GPOS.

For entry and exit a prefix is needed to differentiate it from the mark anchor. The prefix for mark anchor is _, the prefix for cursive anchor is # so it becomes #entry or #exit.

entry/exit anchor don’t need differentiations. They have a unique name. If you add any suffix (a '#' or an emoji) the anchors are used to (cursively) position components but are ignored when generating GPOS (see above). I see how it could be possible to add options to have multiple cursive lookups. But maybe I’ll wait what is decided on this topic here.

About the "Anchor Definition" and "Lookup definition": this is a very detailed and good description what is needed. But it might be too complex for most people. We need to find a good balance.

schriftgestalt commented 4 years ago

the anchor.type should allow custom types. e.g. In Glyphs.app, you can add a 'LSB' anchor to define alternate metrics for the palt feature (used in CJK fonts)

benkiel commented 4 years ago

My understanding of the current state is that we're going to add a .lib to the anchor, with that any type can be stored.

schriftgestalt commented 4 years ago

Good. The comment was just for the .type case.

benkiel commented 4 years ago

@schriftgestalt I think @justvanrossum has folded on this, .lib seems to be current consensus. We'll want to register some standard keys for anchor for common uses, of course.

schriftgestalt commented 4 years ago

I understand. I had wrote that just before he meeting and posted it because I had written it.

dy commented 3 years ago

Sorry if I miss or repeat something, just 2 cents to @khaledhosny comment:

One thing that would still be unsolved is contextual mark positioning. I have no idea how that would be supported for anchors without full OpenType machinery in the format. So I guess people will have to keep writing that manually or have font-specific scripts to handle them.

Would be useful to have anchors accessible by name in features.fea file via anchor format E, so that glif anchors automatically create anchorDefs - that would simplify contextual rules.

position cursive meem.medial <anchor entry_default> <anchor exit_default>;
position cursive @BACK_COND meem.medial' <anchor entry_special> <anchor exit_special> @AHEAD_COND;

Update. anchorDef is global, not per-character, so this solution is unsustainable.

unified-font-object / ufo-spec