googlefonts / ufo2ft

A bridge from UFOs to FontTools objects (and therefore, OTFs and TTFs).
MIT License
152 stars 43 forks source link

Mark feature writer: more specific anchors should override less specific ones #591

Closed simoncozens closed 1 month ago

simoncozens commented 2 years ago

Let's suppose you have pair of anchors "top"/"_top" which attach most top marks to most base characters. But in certain combinations, you want to override that general pairing. (I'm looking at Noto Sans Khmer, which has a top anchor in uni1787 but a separate anchor to be used only for uni17C6.)

When you compile such a font where a mark/base pair is attached by more than one anchor, ufo2ft will tell you that the "last" anchor will prevail, but which anchor ends up being considered "last" is essentially uncontrollable. This is because the anchors are grouped into non-overlapping sets first, by a very clever algorithm:

https://github.com/googlefonts/ufo2ft/blob/f4f49e444bf7a1c1b02f1a98591807cb69d4015b/Lib/ufo2ft/featureWriters/markFeatureWriter.py#L454-L462

Unfortunately, this means that lots of small groups of anchors might end up being put together into a single big group - or alternatively, they might end up being put together into a single medium-sized group, if there are only a few small groups of anchors.

Hence there is no way in this algorithm to reliable sort the groups by size to ensure that more-specific anchors override a set of general ones.

anthrotype commented 1 year ago

sort the groups by size to ensure that more-specific anchors override a set of general ones

I'm not sure exactly what you're proposing. In another related issue, @khaledhosny has proposed to disable this grouping altogether and go back to a simple one-lookup per mark class which in his particular case even leads to smaller file size (which was the point of grouping I suppose)

https://github.com/googlefonts/ufo2ft/issues/762

anthrotype commented 1 year ago

Oh maybe your "more specific anchors should override less specific ones" could be implemented by following @behdad's suggestion below? https://github.com/googlefonts/ufo2ft/issues/563#issuecomment-987257197

if a mark can be attached via multiple anchors, then for any two such anchors a and b, (eg a = 'top' and b = 'top.alt'), then if b.startswith(a) and len(b) > len(a) + 1 and not b[len(a)].isalnum() remove a from the candidates. If more than one anchor is left, warn. Output anchors in alphanumeric sorted order.

I like that actually

moyogo commented 1 year ago

Defaulting to whatever heuristic we have is fine, but it will still break in some cases. There really should be a lib key or a mark feature writer parameter for an order of anchors to use.

belluzj commented 1 year ago

I agree with Denis, @simoncozens you're the second person after @khaledhosny who says you rely on what I would call "ambiguous" anchor combos with the expectation that fontmake should be able to resolve the ambiguity using some heuristic that will give you "what you want".

IMO so far the only solution was to remove the ambiguity from the source files, in order to ensure that for each pair of glyphs there's at most one anchor/_anchor pair that matches between them.

But now that you're both actively relying on this, I think the new "proper" solution is to formalize by a new field on anchors in the sources, defining a list of priorities in case of ambiguity, either like Denis suggests with a lib key, or with some new data on the anchor element in the UFO standard.

simoncozens commented 1 year ago

It sounds like you're implying that I want some magical do-what-I-mean heuristic to get the anchors in the right order. But what I actually want is very clear and well-defined: narrower contexts override wider ones.

To fix this in the sources would mean that to override a single base-mark anchor pair with a font of 200 bases, I would have to add additional anchors to all 200 of the bases, 199 of which would replicate an existing anchor. To fix another pair with a different base would require another 200 additional anchors, 199 of which are replicating existing anchors. And have you ever tried editing a font file when three of the anchor marks sit on top of each other? It's quite frustrating. Now imagine doing that while, all the time, you are thinking "The only reason I have to deal with this madness is because the font compiler doesn't follow a very clear and well-defined mechanism for choosing the right anchor".

anthrotype commented 1 year ago

a very clear and well-defined mechanism for choosing the right anchor

sure, how does one chose the "right anchor" then? We need something more specific than "narrower contexts override wider ones". Let's start with a concrete example that currently can't be encoded in the way you want, and describe how you propose for it to be handled.

belluzj commented 1 year ago

@simoncozens sorry for my tone, I didn't mean to imply anything, I wanted to say that I've now realized that considering this ambiguous and asking the designer to fix it in the source is not the solution, and instead we need to formalize the desired behaviour with data (and like you say, a formal description of the mechanism you have in mind).

khaledhosny commented 1 year ago

My code currently depends on the lookups being sorted based on anchor names, which is not very flexible but at least predictable. Any other predictable algorithm is (like Behdad’s one mentioned above in https://github.com/googlefonts/ufo2ft/issues/591#issuecomment-1665838425) is also fine by me.

khaledhosny commented 1 month ago

This should be fixed now that grouping is gone (disabled by default for now).