[Proposal] OpenType Features as properties of glyph object

typoman commented 5 years ago

I already had my doubts to post this proposal here as this could be seen as not backward compatible and more of a change than an improvement to the spec.

There is already a Kerning object in the Font where data is being stored that will be converted to OpenType features. But shouldn't OpenType features be attached to Glyph rather than the Font? Storing all the kerning in one object is inefficient and prone to other problems if fonts are merged and the character set is changed. It's not only kerning; there are other OT features that are not easy to exchange between fonts. Can you filter glyphs if their type is Base or Mark? Groups are also a glyph property because a glyph member should indicate which group they belong to and not the group. This will remove also implementation problems where they should decide if a glyph is deleted, then a group also should update its members. Here is my proposal which suggests Kerning and many the other OpenType features should be a Glyph property.

Font
- Glyph
  - Features
    - Direction (RTL, LTR, …)
    - Substitutions
      - Substitute (input_glyph_list, output_glyph_list, before_context, after_context, feature_list, language_list)
      - Substitute (input_glyph_list, output_glyph_list, before_context, after_context, feature_list, language_list) ...
    - Positionings
      - Mark (before_context, after_context, feature_list)
      - Kern (before_context, after_context, feature_list) ...
    - Type (Ligature, Component, Mark, …)
    - Script (Arabic, Cyrillic, Latin, …)
    - Groups (kern_gourp1, mark_group1, …)
    - Final name (unixxxx)
    - Carets

Some properties could be empty or non-existent like Carets. The reason GSUB and GPOS rules are separated is that it’s less likely they will be shared in one feature. Another advantage here is that we don’t need to come up with naming schemes for glyph names (e.g. glyph1_glyph2.case) to indicate their features which are very limited (considering contextual rules and shared lookups between features). It's easier to change the character set without having to change a huge feature file. At the end during the font generation, compiler gathers feature data glyph by glyph (could be cached if any glyphs is unchanged) and converts it to binary data. Of course details could be discussed further if you think this is worth discussing or you would accept pull requests regarding this.

justvanrossum commented 5 years ago

Kerning always relates to multiple glyphs, so in which glyph should it be stored? And it gives exactly the same problem regarding subsetting (etc.) as there is now info about some other glyph in the glyph. Your proposal solves nothing for kerning, and probably very little for GPOS/GSUB except those that need no context (SinglePos, SingleSubst, things like that).

typoman commented 5 years ago

With the current state of UFO, if someone removes a glyph, the implementation should also decide if the pair should be removed from kerning. Kerning can be stored in the feature property of the first glyph in the pair. If the first glyph gets removed then kerning is gone. If the second glyph gets removed compiler doesn't add it to the binary. This goes also for other multiple glyph rules. What we have already as one feature file is not suitable for large projects or complex scripts. I see it's not visible to Latin centric view but I think many important data for other writing systems are open to interpretations with the current spec. I should decide what script the glyph belongs to or what direction the glyph is when I generate the OpenType features. I deal with these every time I generate a font. These data should be stored inside the glyph and not interpreted every time a font is about to generate.

justvanrossum commented 5 years ago

If the first glyph gets removed then kerning is gone. If the second glyph gets removed compiler doesn't add it to the binary.

How on earth is this different from storing kerning separately? I might as well say "if the glyph name in a kern pair doesn't exist it won't be added to the binary".

I'm not saying that the current situation is perfect (also: .fea isn't perfect), but saying everything should be stored in the glyph isn't solving much.

typoman commented 5 years ago

There is no advantage if second glyph of the pair gets removed. If kerning pair is in first glyph of the pair and first glyph gets removed, compiler doesn't have to interpret if the kerning pair is valid because it's already gone. This reduces the check time by halve and probabily of having reundant data is also halved.

benkiel commented 5 years ago

@typoman curious how this would handle complex GPOS features, say something like a script typeface that does a lot of contextual substitution? I understand that some feature writing can be standardized in a complier, but not all by far, so how would that work with a proposal like this? What would need to specified for the complier? Seems like the spec would need to say how features are written by a complier so that consistency across compliers is guaranteed. That seems like a big thing to specify, so I'm curious what your thinking there is. It is 100% likely that I'm missing something.

(A side thought is that perhaps it would be useful to be able to tag glyphs with writing direction, that one could use in different ways, but maybe not tied to features?)

justvanrossum commented 5 years ago

As a general note: the kind of changes you propose would have a much better chance if there existed:

A user interface to edit the kind of properties you need
A font compiler that uses said properties

Neither exist right now, and they're not going to write themselves, so for now this is all very much pie in the sky.

Changes that are driven by practical usage instead of idealistic betterment of the format have a better chance to get leverage.

At the same time, development of UFO is necessarily slow, as so many tools depend on it. Breakage is bad, and keeping things backwards compatible is not always easy. It's a lot of work to move forward.

Think of consequences for open source tools such as fontTools.ufoLib, defcon, fontmake, fontParts. Some of these components are not funded, so depend on volunteer work. Think of the commercial tool authors that would need to get behind changes: RoboFont, Glyphs UFO export. And last but not least: the UFO spec is not funded in any way, either.

typoman commented 5 years ago

Thank you Just and Ben to take your time on a suggestion that could only be some text on a GitHub issue. This proposal was based on scripts I've built to add OpenType features to fonts for non-Latin scripts. Some scripts are more complex like Arabic. Since I've worked with other foundries and on multi-script projects I've realized the feature file was not meant for designers and subsetting. The structure of the feature file makes it hard to have an interface to manipulate the data. Designers tend to not interact with text and work with UI and more mistakes happen in these situations.

I understand you need examples or real-life situations and I will make something to show how this could be implemented. Also, the proposal is based on how other tools create an interface to implement OpenType Layout tables, most notably VOLT, so it's not going to impose standards on compilers. One way of compiling this data is to first convert it to feature file and then add it using fontTools to the font. I already know this type of data needs its own compiler and it's not going to be built by itself.

But before making that interface I had to run it by you and see how much I could be mistaken. Right now I have to store this data to the glyph lib and font lib.

@benkiel Some of these properties doesn't conflict with the spec and how compilers work. I think it's worthy to add them to the spec.

Direction (RTL, LTR) This helps for the interface to determine the direction of text while doing the spacing. Not only that, it can determine how OT positioning features should be generated for the glyph. Right now I use Unicode of the glyph to determine the script and ergo the direction which is only an interpretation. If glyph doesn't have Unicode I parse the name and see for example if the suffix-less glyph name exists and what is its Unicode.
Script (Arabic, Cyrillic, Latin, …) The script is also determined now by parsing Unicode and glyph name. In Glyphs app it's added by a dash and some letters to the glyph name. Like -ar -cy. This also causes other issues as you might know already.
Final name (unixxxx) This is handled in AFDKO using GlyphOrderAndAliasDB file and tied to the font, not the glyph.
Carets This could be a list of integers to determine caret positions if the glyph is a ligature. Length of this list will indicate the number of components in the ligature. Right now I have to parse the glyph name to determine that.

justvanrossum commented 5 years ago

A first step could be to register official glyph.lib keys for the needed properties. Final name and Carets are good candidates for that.

It seems your first two points could always be derived if the associated unicode is known. So it currently only "doesn't work" if the glyph names of unencoded glyphs are not directly related to the base glyphs. So maybe an official lib key for "base unicode" is more generally useful than adding properties for direction and script. Or are there cases where even knowing the unicode doesn't help?

public.baseUnicode: Only add if it's not possible te derive the unicode from the glyph name.
~~public.finalName: Only add if automatic naming based on the unicode (uniXXXX or AGLFN lookup) is not possible or desirable.~~ (see http://unifiedfontobject.org/versions/ufo3/lib.plist/#publicpostscriptnames)
public.carets: a list of ints in design units for caret positions in ligatures. I assume its length is numLigatureComponents - 1.

Adding lib keys is a far cheaper way forward than changing the rest of the spec.

(That said, I don't know how we register official lib keys...)

moyogo commented 5 years ago

The final name already exists in the ufo.lib’s public.postscriptNames. See http://unifiedfontobject.org/versions/ufo3/lib.plist/#publicpostscriptnames

Carets would need to be horizontal for horizontal scripts or vertical for vertical scripts.

justvanrossum commented 5 years ago

The final name already exists in the ufo.lib’s public.postscriptNames. See http://unifiedfontobject.org/versions/ufo3/lib.plist/#publicpostscriptnames

Thanks, I should have checked that. So this exists, and should not be changed, even though I (may) agree that that info might be better stored in the glyph.

Carets would need to be horizontal for horizontal scripts or vertical for vertical scripts.

Apart from proper wording of the description, should this also influence the key? Like public.verticalCarets vs. public.horizontalCarets, or is one generic key enough?

typoman commented 5 years ago

@justvanrossum public.baseUnicode needs another processing to determine what is the script. I think an explicit script glyph lib will be best. There is no other need for public.baseUnicode so why not only add the public.script? This could also indicate the writing direction.

@moyogo public.postscriptNames is in the font lib and is derived from the mechanism of compilers without considering the real-life situations how designers change data in fonts. If public.finalName is part of the glyph lib and glyph moves to another font, it carries the data and this doesn't need more processing to fix the top font dictionary.

justvanrossum commented 5 years ago

public.baseUnicode needs another processing to determine what is the script.

But it is a 1-1 relationship, and an easy lookup, no? To know the base unicode is useful in other circumstances, too.

public.postscriptNames is in the font lib and is derived from the mechanism of compilers without considering the real-life situations how designers change data in fonts. If public.finalName is part of the glyph lib and glyph moves to another font, it carries the data and this doesn't need more processing to fix the top font dictionary.

You're not necessarily wrong, but I think this is a change that will cost more than it is worth.

Btw. can you give some examples of cases where you need to manually supply the production name in the first place?

typoman commented 5 years ago

But it is a 1-1 relationship, and an easy lookup, no? To know the base Unicode is useful in other circumstances, too.

There is no reliable way to determine the script from a Unicode value. Personally, I have tagged blocks with scripts myself and I'm not sure if there is a package that does it perfectly in python. Also not writing the glyph script takes control from the user to change it. What if a user wants to search for glyphs which are Cyrillic or the algorithm makes mistakes? Should the implementation infer the glyph script when it iters them? It's an implicit interpretive data. On the other hand, a user could write a python script that adds script tags to the lib and changes them manually if there's a mistake. Probably @schriftgestalt could say why he chose to add it to the glyph names rather than just interpret the glyph Unicode?

can you give some examples of cases where you need to manually supply the production name in the first place?

I'm appending the script I use in RF and you can run it to see that this task is not straightforward, sometimes even components are involved! I understand also it's possible to determine unixxxx names from glyph names but this also takes control form designer. This is done on compile time automatically in fontmake if glyph name doesn't exist in the font public.postscriptNames key and in fdk using the GlyphOrderAndAliasDB file. Since this is mostly hidden from users, almost all of them even don't know if it exists. I think automation is great but algorithms could make mistakes and hiding this information from designers could create later issues. Putting this on glyph level maybe makes it more visible too.

import re
from fontTools.agl import UV2AGL, AGL2UV

class addFinalGlyphNames():

    def __init__(self, f):
        self.f = f
        self.fGlyphs = f.keys()
        self.aff_pattern = re.compile(r'afii[0-9]{3,5}')
        self.gname_pattern = re.compile(r'[^_.]+')
        self.liga_pattern = re.compile(r'[^_]+')
        self.key = 'public.postscriptNames'
        self.nameDic = {}
        self.scripts = {} # { gName : (scriptLessName, scriptTag)
        self.gName2uni = {} # { glyphs with or without script tag : unicodes[0] }
        self.cmap = f.getCharacterMapping()
        for g in self.fGlyphs:
            tag = re.search(r'-\w+', g)
            script = ''
            unicodes = self.f[g].unicodes
            scriptLess = g
            if tag:
                script = tag.group(0)
                scriptLess = re.sub(r'-\w+', '', g)
            self.scripts[g] = (scriptLess, script)
            if unicodes:
                uni = unicodes[0]
                try:
                    self.gName2uni[scriptLess][script] = uni
                except KeyError:
                    self.gName2uni[scriptLess] = {script : uni}

        self.report = []
        self.reservedNames = set(['CR', 'apple', 'mu', 'onesuperior', 'twosuperior',
        'threesuperior', 'fi', 'fl', 'Delta', 'Omega', '.notdef', '.null'])

    def hexUni(self, uniValue):
        """
        takes intiger value and returns hexadecimal unicode string
        """
        return format(uniValue, 'x').zfill(4).upper()

    def uniName(self, uniValue):
        """
        takes unicode intiger value and returns uniXXXX name string
        """

        return 'uni%s' %(self.hexUni(uniValue))

    def finalName(self, uniValue):
        """
        returns agl name according to unicode value.
        if there is no agl name for that unicode or
        the agl name starts with afii, the function
        returns uniXXXX name.
        """
        if uniValue in UV2AGL:
            aglName = UV2AGL[uniValue]
            if len(self.aff_pattern.findall(aglName)) != 1:
                return aglName
        return self.uniName(uniValue)

    def compName(self, gName):
        """
        returns final name according to the ligature components
        making the glyph name. This is used in case
        the glyph doesn't have unicode but its components
        have unicode:
            alef.alt -> uniXXXX
            alef_lam -> uniXXXXXXXX
        """

        gfinalName = []
        uniprefix = False
        nameStr = ''
        suffix_splitted = gName.split(".")
        base = suffix_splitted[0]
        if len(suffix_splitted) > 1 and suffix_splitted[0] in AGL2UV:
            return gName
        liga_splitted = gName.split("_")
        if len(liga_splitted) > 1 and liga_splitted[0] in AGL2UV:
            return gName
        scriptLess, script = self.scripts[gName]
        for g in self.liga_pattern.findall(scriptLess):
            # try to find comp in ligatures comps
            gUnicode = None
            if g in self.gName2uni:
                try:
                    gUnicode = self.gName2uni[g][script]
                except KeyError:
                    pass
            if not gUnicode:
                # if it doesn't have a unicode
                # lets see if it has unicode without the extension
                splitted = g.split('.')
                i = -1
                g = '.'.join(splitted[:i])
                while g not in self.gName2uni:
                    i -= 1
                    g = '.'.join(splitted[:i])
                    if abs(i) > len(splitted):
                        self.report.append( "Can't parse the glyph name '%s' to find appropiate unicode " %gName)
                        break
                else:
                    try:
                        gUnicode = self.gName2uni[g][script]
                    except KeyError:
                        pass
            if g in self.gName2uni and gUnicode:
                uniprefix = True
                gfinalName.append(self.hexUni(gUnicode))
        if uniprefix:
            nameStr = 'uni%s' %''.join(gfinalName)
            # print(gName, nameStr)
        else:
            nameStr = ''.join(gfinalName)
        if nameStr == '':
            g = self.f[gName]
            unis = []
            if len(g.components) > 1:
                for c in g.components:
                    try:
                        uni = self.f[c.baseGlyph].unicodes[0]                    
                        unis.append(self.hexUni(uni))
                    except:
                        self.report.append( "Error: Component '%s' doesn't have a unicode in glyph '%s'." %(self.f[c.baseGlyph], gName))
                        break
                if unis:
                    nameStr = 'uni%s' %''.join(unis)
                    return nameStr
        return nameStr

    def make(self):
        # making the dictionary of new names and old names with its report
        for g in sorted(self.f.glyphOrder):
            g = self.f[g]
            gName = g.name
            fName = gName
            uName = ''
            gUnicode = g.unicode
            if gName not in self.reservedNames:
                if gUnicode != None:
                    fName = self.finalName(gUnicode)
                else:
                    # there are some components, it's very likely that components have unicode
                    fName = self.compName(gName)
                    if fName == '':
                        fName = gName
                    elif fName in self.nameDic.values():
                        splitted = gName.split('.')
                        if len(splitted) > 1:
                            ext = splitted[-1]
                            fName = '%s.%s' %(fName, ext)
            counter = 1
            tempName = fName
            while fName in self.nameDic.values():
                # trying to avoid duplicates!
                newName = '%s.%i' %(tempName, counter)
                self.report.append('\t\t# \'%s\' glyphs name already exist. trying to avoid duplicates by incrementing -> \'%s\'.' %(fName, newName))
                fName = newName
                counter += 1
            # print(gName, fName)
            self.nameDic[gName] = fName

    def add(self):
        self.make()
        self._addKey()

    def _addKey(self):
        if self.f.lib.get(self.key, {}) == self.nameDic:
            return
        self.f.lib[self.key] = self.nameDic
        print("\n".join(map(str, self.nameDic.items())))

    def remove(self):
        self.f.lib.pop(self.key)

    def override(self):
        for g in self.f.glyphOrder:
            g = self.f[g]
            gName = g.name
            self.nameDic[gName] = gName
        self._addKey()

    def output(self):
        return '\n'.join(self.report)

if __name__ == '__main__':
    from mojo.UI import *
    OutputWindow().clear()

    fontObjectList = AllFonts()
    for f in fontObjectList:
        afgn = addFinalGlyphNames(f)
        afgn.add()
        print(afgn.output())

anthrotype commented 5 years ago

There is no reliable way to determine the script from a Unicode value

the fontTools.unicodedata package has script, script_extension and script_name function

https://github.com/fonttools/fonttools/blob/master/Lib/fontTools/unicodedata/__init__.py#L47-L107

schriftgestalt commented 5 years ago

I didn’t like to rely on external data so in Glyphs, all info (script, categories...) is supplied by the GlyphData file. But the user has the option to overwrite it and then it is stored in the file. Only data that can’t be computed is written with the exception of the unicode, to make it a bit easier if someone else (glyphsLib) is reading the file.

And there are some tricky cases like Arabic numerals that are ‘Arabic’ but LTR.

typoman commented 5 years ago

I'm inclined to close this because it seems there is still some confusions on how data should be stored and I don't think in this situation anything is good to be added to the spec. I will store the data in Glyph and Font lib for now until there a stable compiler and solid foundation on why certain data should be stored. Thank you all for your input.

benkiel commented 5 years ago

@typoman agree to close this, but let's spin out talking about adding public.baseunicode and public.caret to other issues, as I think those would be really handy to have (and we can continue discussing public.direction too). I can start those if you want.

typoman commented 5 years ago

I think those data you're mentioning will be useful and I think they are worth to be added. But I don't think there is a one to one relationship between baseunicode and direction and script so they should be stored separately. One example is what @schriftgestal mentioned. Another reason is user control. They should have the power to change it. You might argue that script is not an important data to be stored but that's open to debate.

justvanrossum commented 5 years ago

But I don't think there is a one to one relationship between baseunicode and direction and script so they should be stored separately

Can we be a little more precise about that? Script is an official Unicode property, and is accessible via fonttools. Directionality is also an official character property defined by Unicode, and is available in Python via unicodedata.bidirectional(char).

Now, the only issue I can think of is that either of those two may use an older version of the unicode database than what's needed. fonttools is easiest to keep up-to-date (and is independent of Python version), so perhaps it's an idea to suggest the directionality property to also be made available via fonttools.

Are there any other reasons these values should perhaps be overridable by the glyph data?

LettError commented 5 years ago

Directionality data would be great to have, one way or the other. Perhaps from the unicode release itself. As anything coming from Unicode, it's.. complex.

typoman commented 5 years ago

so perhaps it's an idea to suggest the directionality property to also be made available via fonttools.

That will help and I will use that. Here are two other reasons to add them:

If they're not added they will not make it to interfaces. If they get added there is a reason to add them to the interface. Now by default, nobody considers direction as anything important in spacing tools. Not adding these confirm that they're not important. If you don't add them I bet on next RF update you won't see any property as direction when you want to search for glyphs and not a property to search script called Arabic because Frederik could also say it's not in spec so I won't add it. You see the cycle of us less important script people being dismissed?
I will add them myself to my tools because I don't want to calculate script or direction on every iter on the glyph. Haven't you heard UFO based tools are slow? I will not add more operations.

LettError commented 5 years ago

Now by default, nobody considers direction as anything important in spacing tools. Not adding these confirm that they're not important.

Hang on. Before making assumptions, please consider that directionality is complex. It's not just right to left for Arabic in a spacing window.

justvanrossum commented 5 years ago

I'm trying to establish whether the data you need is available already. If so, adding it to the spec adds redundant data, and that is isn't good for anybody.

Haven't you heard UFO based tools are slow?

Adding redundant data to a font as a means of cache (out of an unfounded fear that it would slow down your workflow in any measurable way) is a remarkably bad idea.

typoman commented 5 years ago

Adding redundant data to a font as a means of cache (out of an unfounded fear that it would slow down your workflow in any measurable way) is a remarkably bad idea.

Imagine if the whole font data needed to be parsed on every write/read, the user cannot hit save. Performance doesn't matter unless in the UI we need an immediate response. One way of looking at it is to cache data to help the compiler to create the binary faster. Another important reason for the proposal is a performance for compilers. Right now if the compiler needs to add the OT features to the font it has to parse a huge file which creates a lag if a user is interacting with features and needs an immediate response from the layout engine. Making data more accessible helps at least in my experience. Please read the script Tag on OpenType spec, here quoting:

Script tags generally correspond to a Unicode script. However, the associations between them may not always be one-to-one, and the OpenType script tags are not guaranteed to be the same as Unicode Script property-value aliases or ISO 15924 script IDs. Since the development of OpenType script tags predates the ISO 15924 or Unicode Script property, the rules for script tags defined in this document may not always be the same as rules for ISO 15924 script IDs. The OpenType script tags can also correlate with a particular OpenType Layout implementation, with the result that more than one script tag may be registered for a given Unicode script (e.g. 'deva' and 'dev2').

Source: https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags

As long there is a script tag in the OT spec I will add them explicitly to glyphs and won't calculate them on compile time even if there is a way to calculate them.

justvanrossum commented 5 years ago

However, the associations between them may not always be one-to-one, and the OpenType script tags are not guaranteed to be the same as Unicode Script property-value aliases

That is useful information, thanks.

Can you give me an example of how the script tag from a glyph will be used by a feature builder?

justvanrossum commented 5 years ago

How about a glyph lib key named public.otScriptTag?

Is it ever needed to assign multiple script tags to a single glyph?

justvanrossum commented 5 years ago

@benkiel would it be better to open separate issues for each possible new glyph lib key, or one combined one? Leaning towards separate issues.

The candidates seem to be:

public.otScriptTag (or public.openTypeScriptTag?) (can there be multiple script tags for a single glyph?)
public.direction (can a glyph be assigned multiple directions? what values are possible?)
public.carets or maybe public.ligatureCarets (what about vertical?)

(I'm no longer proposing public.baseUnicode, as it was an attempt to unify script and direction via the unicode spec.)

benkiel commented 5 years ago

Also think separate. I think that there is merit in public.baseUnicode for doing things like defcon’s pseudo Unicode

typoman commented 5 years ago

Can you give me an example of how the script tag from a glyph will be used by a feature builder?

There are two usages for scripts in my experience.

OpenType usage: Scripts technically are not glyph-specific and are lookup specific:

The lookups in every OpenType feature must be registered under one or more language systems. The lookups of a particular feature may vary across the language systems under which the feature is registered.

Source: https://adobe-type-tools.github.io/afdko/OpenTypeFeatureFileSpecification.html#4.b

In practice, there are no situations I can think a lookup on a glyph will be associated with two scripts but I might be wrong. Think about numbers or punctuations. I will investigate more if it would be even possible to associate a lookup with two scripts.

Usability in an interface: It's possible a user wants to limit the search for glyphs to a specific script when they're building OT features or limiting their glyph set overview in an interface.

For the above reasons, I would advise either not adding a script tag to glyph or not choosing the name public.otScriptTag as it could cause further confusions that OT script tag is glyph-specific.

typoman commented 5 years ago

Maybe just public.script is enough?

benkiel commented 5 years ago

public.direction could be: left, right, any, vertical?

justvanrossum commented 5 years ago

In OTF, scripts are feature-specific. A lookup can be used by arbitrary features, so a lookup can easily be used under multiple scripts. (Whether that happens in practice is a separate question.)

Usability in an interface: It's possible a user wants to limit the search for glyphs to a specific script when they're building OT features or limiting their glyph set overview in an interface.

This use case would be much better served with a general marking mechanism (which would be super useful in all sorts of contexts). I don't read any script-specific usage in this.

justvanrossum commented 5 years ago

It's possible a user wants to limit the search for glyphs to a specific script when they're building OT features or limiting their glyph set overview in an interface.

Btw, this is very much a UI feature request, rather than something that (first) needs to be solved at the file format level. The existence of glyph.lib ensures there doesn't need to be a chicken-and-egg situation. Whatever catches on can be folded into the format.

For this specific use case (select by script etc) I would start by using various unicode properties (it's a very rich and useful data set!) rather than assuming up front this will fall short of practical needs (and therefore would need a custom solution). Any shortcomings will surely be found soon upon practical usage, and then we can talk much more easily about how to tackle them.

typoman commented 5 years ago

public.direction could be: left, right, any, vertical?

According to what @LettError posted from unicode I can use following values that will cover enough for bidrectional text:

Neutral | No override is currently active Right-to-left | Characters are to be reset to R Left-to-right | Characters are to be reset to L Source: https://unicode.org/reports/tr9/#Table_Directional_Override_Status

Above table can be summerized in RTL, LTR and None. You think this is enough @LettError?

As for vertical I don't know if values are needed (see below). Also there is no mention of defining vertical direction switch in OpenType spec.

In the case of vertical line orientation, the Bidirectional Algorithm is still used to determine the levels of the text. Source: https://unicode.org/reports/tr9/#Vertical_Text

LettError commented 5 years ago

Would there be a difference between public.direction = None and the absence of the entry?
These values are specifically from Directional Override Status - is that what public.direction is for?
Which process or tool will consume this value?

typoman commented 5 years ago

Would there be a difference between public.direction = None and the absence of the entry?

I don't think so. So basically the absence of value means None or Neutral.

These values are specifically from Directional Override Status - is that what public.direction is for?

If the direction of the glyph is RTL it will affect how kerning will be generated for the pair. If both glyphs in a pair are RTL or one glyph is RTL and the other glyph is None then the record in Adobe fea file syntax will be:

pos glyph1/group1 glyph2/group2 <value 0 value 0>

If both glyphs in a pair are LTR or both are None (or value is not present) or one glyph is LTR and the other glyph is None (or value is not present) then the record in Adobe fea file syntax will be:

pos glyph1/group1 glyph2/group2 value

If LTR and RTL are mixed in a kerning pair, the pair should be invalid as the layout engine itemization should split the pair and lookup should not affect the pair.

. Groups that their glyph members are RTL or None are cosidered RTL. . Groups that their glyph members are LTR or None are cosidered LTR. . Groups that their glyph members are all None are cosidered LTR. . Groups that their glyph members are LTR and RTL are invalid.

Which process or tool will consume this value?

A compiler that reads the font kerning object and converts it to OpenType layout table or any tool that creates the adobe fea file syntax.

LettError commented 5 years ago

Just so I understand

this field is intended to override whatever direction is associated with the unicode value.
if there is no unicode value for a glyph, thus no direction, this field states what it should be ?

justvanrossum commented 5 years ago

Btw. for this direction property, please revisit #16 for interesting insights.

typoman commented 5 years ago

this field is intended to override whatever direction is associated with the unicode value.

Exactly!

if there is no unicode value for a glyph, thus no direction, this field states what it should be?

It should state None as in Neutral. This is a situation where data is not enough to generate kerning. The compiler should create a regular LTR kerning or use its own dictionary to interpret the unicode thus the direction from the glyph name.

justvanrossum commented 5 years ago

Re-reading #16 suggests to me that adding a direction property to glyphs is not a good idea to begin with. It is just not that simple.

typoman commented 5 years ago

I don't see the method I mentioned there. I've been generating kerning using this method. This comes from OT flags where a lookup is RTL or LTR. As I mentioned above the algorithm defines the RTL or LTR state of the lookup based on the direction of the glyph. What is not simple?

typoman commented 5 years ago

Btw LTR and neutral are basically same in terms of flags when it comes to generating kerning. The lookup doesn't get any flags.

typoman commented 5 years ago

Reading the addressed thread let's forget about storing kerning in UFO for a moment. Imagine an interface is able to write GPOS value records directly into the binary font when someone is kerning a pair. How the interface should decide if the value record should be just an integer (LTR) as I mentioned above or a four integers array record which also includes the advance (RTL) without knowing the state of the direction of glyphs/groups in the pairs? Should the interface decide on the glyph direction by parsing its glyph name if it's unicode-less? What if the user used blabla for a glyph name? Is it a bla + bla ligature? Should we impose naming schemes on users to be able to get the direction of glyph? It's a weak idea to parse glyph data to interpret its direction. There is no way this can be solved without writing the direction in the glyph. We can't make rules on what should be an RTL glyph if it doesn't have unicode.

LettError commented 5 years ago

So, perhaps ligatures should store a list of names that make up the ligature?

typoman commented 5 years ago

Do the members of ligature have unicode? If they don't how to find their unicode? Parse another data? It's endless.

LettError commented 5 years ago

Maybe the members of a ligature have unicodes, maybe they don't: a search may have to go several hops in.

typoman commented 5 years ago

Problem is not the hops, it's interpreting data and there can be exceptions always. If a compiler has to interpret what is the direction of members of ligature, it should also expose it to the user. If it was one to one relationship it was reliable but it's not so the user should be able to override it.

LettError commented 5 years ago

Do you have an example for a glyph that:

is a ligature, that in itself does not have a unicode value
but its "member" glyphs do (either directly, or deeper)
these unicodes have (unicode sourced) direction data

Can the direction that you want for the glyph be different from the set of directions provided by the members?

justvanrossum commented 5 years ago

Seems we're going in circles.

Can you give specific examples of cases where you:

can't know the Unicode (and therefore can't know the Bidi_Class)
must override the Unicode Bidi_Class property

Should we impose naming schemes on users to be able to get the direction of glyph?

No, but glyph naming conventions are extremely useful, easy to use, and popular. It's a mini language that is equally well understood by people and machines.

typoman commented 5 years ago

I'm gonna give you an example where parsing name could go wrong. There is a unicode-less ligature named sad_yeh-farsi.fina here is a glyph set:

sad-ar, sad-ar.fina, sad-ar.medi, sad-ar.init, yeh-ar, yeh-ar.fina, yeh-ar.medi, yeh-ar.init, yeh-farsi, yeh-farsi.fina, yeh-farsi.medi, yeh-farsi.init

User expectations of ligature components: sad-ar.medi+yeh-farsi.fina Compiler result: Nothing!

Parsing name could go wrong to find ligature names. I can impose a naming scheme on the user to make the ligature work. But it's my expectations. There is no place in OT spec that says a ligature should have a certain name and why should I make a tool to impose that? I won't go on about the examples. Can we please establish our arguments on abstract data and OT spec? Can we please establish that a user might not structure the glyph data like the way we expect and the fact that there is an RTL flag in OT spec for a reason?

unified-font-object / ufo-spec

[Proposal] OpenType Features as properties of glyph object #87