Closed typoman closed 5 years ago
Kerning always relates to multiple glyphs, so in which glyph should it be stored? And it gives exactly the same problem regarding subsetting (etc.) as there is now info about some other glyph in the glyph. Your proposal solves nothing for kerning, and probably very little for GPOS/GSUB except those that need no context (SinglePos, SingleSubst, things like that).
With the current state of UFO, if someone removes a glyph, the implementation should also decide if the pair should be removed from kerning. Kerning can be stored in the feature property of the first glyph in the pair. If the first glyph gets removed then kerning is gone. If the second glyph gets removed compiler doesn't add it to the binary. This goes also for other multiple glyph rules. What we have already as one feature file is not suitable for large projects or complex scripts. I see it's not visible to Latin centric view but I think many important data for other writing systems are open to interpretations with the current spec. I should decide what script the glyph belongs to or what direction the glyph is when I generate the OpenType features. I deal with these every time I generate a font. These data should be stored inside the glyph and not interpreted every time a font is about to generate.
If the first glyph gets removed then kerning is gone. If the second glyph gets removed compiler doesn't add it to the binary.
How on earth is this different from storing kerning separately? I might as well say "if the glyph name in a kern pair doesn't exist it won't be added to the binary".
I'm not saying that the current situation is perfect (also: .fea isn't perfect), but saying everything should be stored in the glyph isn't solving much.
There is no advantage if second glyph of the pair gets removed. If kerning pair is in first glyph of the pair and first glyph gets removed, compiler doesn't have to interpret if the kerning pair is valid because it's already gone. This reduces the check time by halve and probabily of having reundant data is also halved.
@typoman curious how this would handle complex GPOS features, say something like a script typeface that does a lot of contextual substitution? I understand that some feature writing can be standardized in a complier, but not all by far, so how would that work with a proposal like this? What would need to specified for the complier? Seems like the spec would need to say how features are written by a complier so that consistency across compliers is guaranteed. That seems like a big thing to specify, so I'm curious what your thinking there is. It is 100% likely that I'm missing something.
(A side thought is that perhaps it would be useful to be able to tag glyphs with writing direction, that one could use in different ways, but maybe not tied to features?)
As a general note: the kind of changes you propose would have a much better chance if there existed:
Neither exist right now, and they're not going to write themselves, so for now this is all very much pie in the sky.
Changes that are driven by practical usage instead of idealistic betterment of the format have a better chance to get leverage.
At the same time, development of UFO is necessarily slow, as so many tools depend on it. Breakage is bad, and keeping things backwards compatible is not always easy. It's a lot of work to move forward.
Think of consequences for open source tools such as fontTools.ufoLib, defcon, fontmake, fontParts. Some of these components are not funded, so depend on volunteer work. Think of the commercial tool authors that would need to get behind changes: RoboFont, Glyphs UFO export. And last but not least: the UFO spec is not funded in any way, either.
Thank you Just and Ben to take your time on a suggestion that could only be some text on a GitHub issue. This proposal was based on scripts I've built to add OpenType features to fonts for non-Latin scripts. Some scripts are more complex like Arabic. Since I've worked with other foundries and on multi-script projects I've realized the feature file was not meant for designers and subsetting. The structure of the feature file makes it hard to have an interface to manipulate the data. Designers tend to not interact with text and work with UI and more mistakes happen in these situations.
I understand you need examples or real-life situations and I will make something to show how this could be implemented. Also, the proposal is based on how other tools create an interface to implement OpenType Layout tables, most notably VOLT, so it's not going to impose standards on compilers. One way of compiling this data is to first convert it to feature file and then add it using fontTools to the font. I already know this type of data needs its own compiler and it's not going to be built by itself.
But before making that interface I had to run it by you and see how much I could be mistaken. Right now I have to store this data to the glyph lib and font lib.
@benkiel Some of these properties doesn't conflict with the spec and how compilers work. I think it's worthy to add them to the spec.
Direction (RTL, LTR) This helps for the interface to determine the direction of text while doing the spacing. Not only that, it can determine how OT positioning features should be generated for the glyph. Right now I use Unicode of the glyph to determine the script and ergo the direction which is only an interpretation. If glyph doesn't have Unicode I parse the name and see for example if the suffix-less glyph name exists and what is its Unicode.
Script (Arabic, Cyrillic, Latin, …) The script is also determined now by parsing Unicode and glyph name. In Glyphs app it's added by a dash and some letters to the glyph name. Like -ar -cy. This also causes other issues as you might know already.
Final name (unixxxx) This is handled in AFDKO using GlyphOrderAndAliasDB file and tied to the font, not the glyph.
Carets This could be a list of integers to determine caret positions if the glyph is a ligature. Length of this list will indicate the number of components in the ligature. Right now I have to parse the glyph name to determine that.
A first step could be to register official glyph.lib
keys for the needed properties. Final name and Carets are good candidates for that.
It seems your first two points could always be derived if the associated unicode is known. So it currently only "doesn't work" if the glyph names of unencoded glyphs are not directly related to the base glyphs. So maybe an official lib key for "base unicode" is more generally useful than adding properties for direction and script. Or are there cases where even knowing the unicode doesn't help?
public.baseUnicode
: Only add if it's not possible te derive the unicode from the glyph name.public.finalName
: Only add if automatic naming based on the unicode (uniXXXX
or AGLFN lookup) is not possible or desirable.public.carets
: a list of ints in design units for caret positions in ligatures. I assume its length is numLigatureComponents - 1
.Adding lib keys is a far cheaper way forward than changing the rest of the spec.
(That said, I don't know how we register official lib keys...)
The final name already exists in the ufo.lib’s public.postscriptNames
. See http://unifiedfontobject.org/versions/ufo3/lib.plist/#publicpostscriptnames
Carets would need to be horizontal for horizontal scripts or vertical for vertical scripts.
The final name already exists in the ufo.lib’s public.postscriptNames. See http://unifiedfontobject.org/versions/ufo3/lib.plist/#publicpostscriptnames
Thanks, I should have checked that. So this exists, and should not be changed, even though I (may) agree that that info might be better stored in the glyph.
Carets would need to be horizontal for horizontal scripts or vertical for vertical scripts.
Apart from proper wording of the description, should this also influence the key? Like public.verticalCarets
vs. public.horizontalCarets
, or is one generic key enough?
@justvanrossum
public.baseUnicode
needs another processing to determine what is the script. I think an explicit script glyph lib will be best. There is no other need for public.baseUnicode
so why not only add the public.script
? This could also indicate the writing direction.
@moyogo
public.postscriptNames
is in the font lib and is derived from the mechanism of compilers without considering the real-life situations how designers change data in fonts. If public.finalName
is part of the glyph lib and glyph moves to another font, it carries the data and this doesn't need more processing to fix the top font dictionary.
public.baseUnicode needs another processing to determine what is the script.
But it is a 1-1 relationship, and an easy lookup, no? To know the base unicode is useful in other circumstances, too.
public.postscriptNames is in the font lib and is derived from the mechanism of compilers without considering the real-life situations how designers change data in fonts. If public.finalName is part of the glyph lib and glyph moves to another font, it carries the data and this doesn't need more processing to fix the top font dictionary.
You're not necessarily wrong, but I think this is a change that will cost more than it is worth.
Btw. can you give some examples of cases where you need to manually supply the production name in the first place?
But it is a 1-1 relationship, and an easy lookup, no? To know the base Unicode is useful in other circumstances, too.
There is no reliable way to determine the script from a Unicode value. Personally, I have tagged blocks with scripts myself and I'm not sure if there is a package that does it perfectly in python. Also not writing the glyph script takes control from the user to change it. What if a user wants to search for glyphs which are Cyrillic
or the algorithm makes mistakes? Should the implementation infer the glyph script when it iters them? It's an implicit interpretive data. On the other hand, a user could write a python script that adds script tags to the lib and changes them manually if there's a mistake. Probably @schriftgestalt could say why he chose to add it to the glyph names rather than just interpret the glyph Unicode?
can you give some examples of cases where you need to manually supply the production name in the first place?
I'm appending the script I use in RF and you can run it to see that this task is not straightforward, sometimes even components are involved! I understand also it's possible to determine unixxxx names from glyph names but this also takes control form designer. This is done on compile time automatically in fontmake if glyph name doesn't exist in the font public.postscriptNames
key and in fdk using the GlyphOrderAndAliasDB
file. Since this is mostly hidden from users, almost all of them even don't know if it exists. I think automation is great but algorithms could make mistakes and hiding this information from designers could create later issues. Putting this on glyph level maybe makes it more visible too.
import re
from fontTools.agl import UV2AGL, AGL2UV
class addFinalGlyphNames():
def __init__(self, f):
self.f = f
self.fGlyphs = f.keys()
self.aff_pattern = re.compile(r'afii[0-9]{3,5}')
self.gname_pattern = re.compile(r'[^_.]+')
self.liga_pattern = re.compile(r'[^_]+')
self.key = 'public.postscriptNames'
self.nameDic = {}
self.scripts = {} # { gName : (scriptLessName, scriptTag)
self.gName2uni = {} # { glyphs with or without script tag : unicodes[0] }
self.cmap = f.getCharacterMapping()
for g in self.fGlyphs:
tag = re.search(r'-\w+', g)
script = ''
unicodes = self.f[g].unicodes
scriptLess = g
if tag:
script = tag.group(0)
scriptLess = re.sub(r'-\w+', '', g)
self.scripts[g] = (scriptLess, script)
if unicodes:
uni = unicodes[0]
try:
self.gName2uni[scriptLess][script] = uni
except KeyError:
self.gName2uni[scriptLess] = {script : uni}
self.report = []
self.reservedNames = set(['CR', 'apple', 'mu', 'onesuperior', 'twosuperior',
'threesuperior', 'fi', 'fl', 'Delta', 'Omega', '.notdef', '.null'])
def hexUni(self, uniValue):
"""
takes intiger value and returns hexadecimal unicode string
"""
return format(uniValue, 'x').zfill(4).upper()
def uniName(self, uniValue):
"""
takes unicode intiger value and returns uniXXXX name string
"""
return 'uni%s' %(self.hexUni(uniValue))
def finalName(self, uniValue):
"""
returns agl name according to unicode value.
if there is no agl name for that unicode or
the agl name starts with afii, the function
returns uniXXXX name.
"""
if uniValue in UV2AGL:
aglName = UV2AGL[uniValue]
if len(self.aff_pattern.findall(aglName)) != 1:
return aglName
return self.uniName(uniValue)
def compName(self, gName):
"""
returns final name according to the ligature components
making the glyph name. This is used in case
the glyph doesn't have unicode but its components
have unicode:
alef.alt -> uniXXXX
alef_lam -> uniXXXXXXXX
"""
gfinalName = []
uniprefix = False
nameStr = ''
suffix_splitted = gName.split(".")
base = suffix_splitted[0]
if len(suffix_splitted) > 1 and suffix_splitted[0] in AGL2UV:
return gName
liga_splitted = gName.split("_")
if len(liga_splitted) > 1 and liga_splitted[0] in AGL2UV:
return gName
scriptLess, script = self.scripts[gName]
for g in self.liga_pattern.findall(scriptLess):
# try to find comp in ligatures comps
gUnicode = None
if g in self.gName2uni:
try:
gUnicode = self.gName2uni[g][script]
except KeyError:
pass
if not gUnicode:
# if it doesn't have a unicode
# lets see if it has unicode without the extension
splitted = g.split('.')
i = -1
g = '.'.join(splitted[:i])
while g not in self.gName2uni:
i -= 1
g = '.'.join(splitted[:i])
if abs(i) > len(splitted):
self.report.append( "Can't parse the glyph name '%s' to find appropiate unicode " %gName)
break
else:
try:
gUnicode = self.gName2uni[g][script]
except KeyError:
pass
if g in self.gName2uni and gUnicode:
uniprefix = True
gfinalName.append(self.hexUni(gUnicode))
if uniprefix:
nameStr = 'uni%s' %''.join(gfinalName)
# print(gName, nameStr)
else:
nameStr = ''.join(gfinalName)
if nameStr == '':
g = self.f[gName]
unis = []
if len(g.components) > 1:
for c in g.components:
try:
uni = self.f[c.baseGlyph].unicodes[0]
unis.append(self.hexUni(uni))
except:
self.report.append( "Error: Component '%s' doesn't have a unicode in glyph '%s'." %(self.f[c.baseGlyph], gName))
break
if unis:
nameStr = 'uni%s' %''.join(unis)
return nameStr
return nameStr
def make(self):
# making the dictionary of new names and old names with its report
for g in sorted(self.f.glyphOrder):
g = self.f[g]
gName = g.name
fName = gName
uName = ''
gUnicode = g.unicode
if gName not in self.reservedNames:
if gUnicode != None:
fName = self.finalName(gUnicode)
else:
# there are some components, it's very likely that components have unicode
fName = self.compName(gName)
if fName == '':
fName = gName
elif fName in self.nameDic.values():
splitted = gName.split('.')
if len(splitted) > 1:
ext = splitted[-1]
fName = '%s.%s' %(fName, ext)
counter = 1
tempName = fName
while fName in self.nameDic.values():
# trying to avoid duplicates!
newName = '%s.%i' %(tempName, counter)
self.report.append('\t\t# \'%s\' glyphs name already exist. trying to avoid duplicates by incrementing -> \'%s\'.' %(fName, newName))
fName = newName
counter += 1
# print(gName, fName)
self.nameDic[gName] = fName
def add(self):
self.make()
self._addKey()
def _addKey(self):
if self.f.lib.get(self.key, {}) == self.nameDic:
return
self.f.lib[self.key] = self.nameDic
print("\n".join(map(str, self.nameDic.items())))
def remove(self):
self.f.lib.pop(self.key)
def override(self):
for g in self.f.glyphOrder:
g = self.f[g]
gName = g.name
self.nameDic[gName] = gName
self._addKey()
def output(self):
return '\n'.join(self.report)
if __name__ == '__main__':
from mojo.UI import *
OutputWindow().clear()
fontObjectList = AllFonts()
for f in fontObjectList:
afgn = addFinalGlyphNames(f)
afgn.add()
print(afgn.output())
There is no reliable way to determine the script from a Unicode value
the fontTools.unicodedata
package has script
, script_extension
and script_name
function
https://github.com/fonttools/fonttools/blob/master/Lib/fontTools/unicodedata/__init__.py#L47-L107
I didn’t like to rely on external data so in Glyphs, all info (script, categories...) is supplied by the GlyphData file. But the user has the option to overwrite it and then it is stored in the file. Only data that can’t be computed is written with the exception of the unicode, to make it a bit easier if someone else (glyphsLib) is reading the file.
And there are some tricky cases like Arabic numerals that are ‘Arabic’ but LTR.
I'm inclined to close this because it seems there is still some confusions on how data should be stored and I don't think in this situation anything is good to be added to the spec. I will store the data in Glyph and Font lib for now until there a stable compiler and solid foundation on why certain data should be stored. Thank you all for your input.
@typoman agree to close this, but let's spin out talking about adding public.baseunicode
and public.caret
to other issues, as I think those would be really handy to have (and we can continue discussing public.direction
too). I can start those if you want.
I think those data you're mentioning will be useful and I think they are worth to be added. But I don't think there is a one to one relationship between baseunicode and direction and script so they should be stored separately. One example is what @schriftgestal mentioned. Another reason is user control. They should have the power to change it. You might argue that script is not an important data to be stored but that's open to debate.
But I don't think there is a one to one relationship between baseunicode and direction and script so they should be stored separately
Can we be a little more precise about that? Script is an official Unicode property, and is accessible via fonttools. Directionality is also an official character property defined by Unicode, and is available in Python via unicodedata.bidirectional(char)
.
Now, the only issue I can think of is that either of those two may use an older version of the unicode database than what's needed. fonttools is easiest to keep up-to-date (and is independent of Python version), so perhaps it's an idea to suggest the directionality property to also be made available via fonttools.
Are there any other reasons these values should perhaps be overridable by the glyph data?
Directionality data would be great to have, one way or the other. Perhaps from the unicode release itself. As anything coming from Unicode, it's.. complex.
so perhaps it's an idea to suggest the directionality property to also be made available via fonttools.
That will help and I will use that. Here are two other reasons to add them:
If they're not added they will not make it to interfaces. If they get added there is a reason to add them to the interface. Now by default, nobody considers direction as anything important in spacing tools. Not adding these confirm that they're not important. If you don't add them I bet on next RF update you won't see any property as direction when you want to search for glyphs and not a property to search script called Arabic because Frederik could also say it's not in spec so I won't add it. You see the cycle of us less important script people being dismissed?
I will add them myself to my tools because I don't want to calculate script or direction on every iter on the glyph. Haven't you heard UFO based tools are slow? I will not add more operations.
Now by default, nobody considers direction as anything important in spacing tools. Not adding these confirm that they're not important.
Hang on. Before making assumptions, please consider that directionality is complex. It's not just right to left for Arabic in a spacing window.
I'm trying to establish whether the data you need is available already. If so, adding it to the spec adds redundant data, and that is isn't good for anybody.
Haven't you heard UFO based tools are slow?
Adding redundant data to a font as a means of cache (out of an unfounded fear that it would slow down your workflow in any measurable way) is a remarkably bad idea.
Adding redundant data to a font as a means of cache (out of an unfounded fear that it would slow down your workflow in any measurable way) is a remarkably bad idea.
Imagine if the whole font data needed to be parsed on every write/read, the user cannot hit save. Performance doesn't matter unless in the UI we need an immediate response. One way of looking at it is to cache data to help the compiler to create the binary faster. Another important reason for the proposal is a performance for compilers. Right now if the compiler needs to add the OT features to the font it has to parse a huge file which creates a lag if a user is interacting with features and needs an immediate response from the layout engine. Making data more accessible helps at least in my experience. Please read the script Tag on OpenType spec, here quoting:
Script tags generally correspond to a Unicode script. However, the associations between them may not always be one-to-one, and the OpenType script tags are not guaranteed to be the same as Unicode Script property-value aliases or ISO 15924 script IDs. Since the development of OpenType script tags predates the ISO 15924 or Unicode Script property, the rules for script tags defined in this document may not always be the same as rules for ISO 15924 script IDs. The OpenType script tags can also correlate with a particular OpenType Layout implementation, with the result that more than one script tag may be registered for a given Unicode script (e.g. 'deva' and 'dev2').
Source: https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags
As long there is a script tag in the OT spec I will add them explicitly to glyphs and won't calculate them on compile time even if there is a way to calculate them.
However, the associations between them may not always be one-to-one, and the OpenType script tags are not guaranteed to be the same as Unicode Script property-value aliases
That is useful information, thanks.
Can you give me an example of how the script tag from a glyph will be used by a feature builder?
How about a glyph lib key named public.otScriptTag
?
Is it ever needed to assign multiple script tags to a single glyph?
@benkiel would it be better to open separate issues for each possible new glyph lib key, or one combined one? Leaning towards separate issues.
The candidates seem to be:
public.otScriptTag
(or public.openTypeScriptTag
?) (can there be multiple script tags for a single glyph?)public.direction
(can a glyph be assigned multiple directions? what values are possible?)public.carets
or maybe public.ligatureCarets
(what about vertical?)(I'm no longer proposing public.baseUnicode
, as it was an attempt to unify script and direction via the unicode spec.)
Also think separate. I think that there is merit in public.baseUnicode
for doing things like defcon’s pseudo Unicode
Can you give me an example of how the script tag from a glyph will be used by a feature builder?
There are two usages for scripts in my experience.
The lookups in every OpenType feature must be registered under one or more language systems. The lookups of a particular feature may vary across the language systems under which the feature is registered.
Source: https://adobe-type-tools.github.io/afdko/OpenTypeFeatureFileSpecification.html#4.b
In practice, there are no situations I can think a lookup on a glyph will be associated with two scripts but I might be wrong. Think about numbers or punctuations. I will investigate more if it would be even possible to associate a lookup with two scripts.
For the above reasons, I would advise either not adding a script tag to glyph or not choosing the name public.otScriptTag
as it could cause further confusions that OT script tag is glyph-specific.
Maybe just public.script
is enough?
public.direction
could be: left, right, any, vertical?
In OTF, scripts are feature-specific. A lookup can be used by arbitrary features, so a lookup can easily be used under multiple scripts. (Whether that happens in practice is a separate question.)
Usability in an interface: It's possible a user wants to limit the search for glyphs to a specific script when they're building OT features or limiting their glyph set overview in an interface.
This use case would be much better served with a general marking mechanism (which would be super useful in all sorts of contexts). I don't read any script-specific usage in this.
It's possible a user wants to limit the search for glyphs to a specific script when they're building OT features or limiting their glyph set overview in an interface.
Btw, this is very much a UI feature request, rather than something that (first) needs to be solved at the file format level. The existence of glyph.lib
ensures there doesn't need to be a chicken-and-egg situation. Whatever catches on can be folded into the format.
For this specific use case (select by script etc) I would start by using various unicode properties (it's a very rich and useful data set!) rather than assuming up front this will fall short of practical needs (and therefore would need a custom solution). Any shortcomings will surely be found soon upon practical usage, and then we can talk much more easily about how to tackle them.
public.direction
could be: left, right, any, vertical?
According to what @LettError posted from unicode I can use following values that will cover enough for bidrectional text:
Neutral | No override is currently active Right-to-left | Characters are to be reset to R Left-to-right | Characters are to be reset to L Source: https://unicode.org/reports/tr9/#Table_Directional_Override_Status
Above table can be summerized in RTL
, LTR
and None
. You think this is enough @LettError?
As for vertical I don't know if values are needed (see below). Also there is no mention of defining vertical direction switch in OpenType spec.
In the case of vertical line orientation, the Bidirectional Algorithm is still used to determine the levels of the text. Source: https://unicode.org/reports/tr9/#Vertical_Text
public.direction = None
and the absence of the entry?Directional Override Status
- is that what public.direction is for?Would there be a difference between public.direction = None and the absence of the entry?
I don't think so. So basically the absence of value means None
or Neutral
.
These values are specifically from Directional Override Status - is that what public.direction is for?
If the direction of the glyph is RTL
it will affect how kerning will be generated for the pair. If both glyphs in a pair are RTL
or one glyph is RTL
and the other glyph is None
then the record in Adobe fea file syntax will be:
pos glyph1/group1 glyph2/group2 <value 0 value 0>
If both glyphs in a pair are LTR
or both are None
(or value is not present) or one glyph is LTR
and the other glyph is None
(or value is not present) then the record in Adobe fea file syntax will be:
pos glyph1/group1 glyph2/group2 value
If LTR
and RTL
are mixed in a kerning pair, the pair should be invalid as the layout engine itemization should split the pair and lookup should not affect the pair.
. Groups that their glyph members are RTL
or None
are cosidered RTL
.
. Groups that their glyph members are LTR
or None
are cosidered LTR
.
. Groups that their glyph members are all None
are cosidered LTR
.
. Groups that their glyph members are LTR
and RTL
are invalid.
Which process or tool will consume this value?
A compiler that reads the font kerning object and converts it to OpenType layout table or any tool that creates the adobe fea file syntax.
Just so I understand
Btw. for this direction property, please revisit #16 for interesting insights.
this field is intended to override whatever direction is associated with the unicode value.
Exactly!
if there is no unicode value for a glyph, thus no direction, this field states what it should be?
It should state None
as in Neutral
. This is a situation where data is not enough to generate kerning. The compiler should create a regular LTR
kerning or use its own dictionary to interpret the unicode thus the direction from the glyph name.
Re-reading #16 suggests to me that adding a direction property to glyphs is not a good idea to begin with. It is just not that simple.
I don't see the method I mentioned there. I've been generating kerning using this method. This comes from OT flags where a lookup is RTL or LTR. As I mentioned above the algorithm defines the RTL or LTR state of the lookup based on the direction of the glyph. What is not simple?
Btw LTR and neutral are basically same in terms of flags when it comes to generating kerning. The lookup doesn't get any flags.
Reading the addressed thread let's forget about storing kerning in UFO for a moment. Imagine an interface is able to write GPOS value records directly into the binary font when someone is kerning a pair. How the interface should decide if the value record should be just an integer (LTR
) as I mentioned above or a four integers array record which also includes the advance (RTL
) without knowing the state of the direction of glyphs/groups in the pairs? Should the interface decide on the glyph direction by parsing its glyph name if it's unicode-less? What if the user used blabla
for a glyph name? Is it a bla
+ bla
ligature? Should we impose naming schemes on users to be able to get the direction of glyph? It's a weak idea to parse glyph data to interpret its direction. There is no way this can be solved without writing the direction in the glyph. We can't make rules on what should be an RTL
glyph if it doesn't have unicode.
So, perhaps ligatures should store a list of names that make up the ligature?
Do the members of ligature have unicode? If they don't how to find their unicode? Parse another data? It's endless.
Maybe the members of a ligature have unicodes, maybe they don't: a search may have to go several hops in.
Problem is not the hops, it's interpreting data and there can be exceptions always. If a compiler has to interpret what is the direction of members of ligature, it should also expose it to the user. If it was one to one relationship it was reliable but it's not so the user should be able to override it.
Do you have an example for a glyph that:
Can the direction that you want for the glyph be different from the set of directions provided by the members?
Seems we're going in circles.
Can you give specific examples of cases where you:
Should we impose naming schemes on users to be able to get the direction of glyph?
No, but glyph naming conventions are extremely useful, easy to use, and popular. It's a mini language that is equally well understood by people and machines.
I'm gonna give you an example where parsing name could go wrong. There is a unicode-less ligature named sad_yeh-farsi.fina
here is a glyph set:
sad-ar
, sad-ar.fina
, sad-ar.medi
, sad-ar.init
, yeh-ar
, yeh-ar.fina
, yeh-ar.medi
, yeh-ar.init
, yeh-farsi
, yeh-farsi.fina
, yeh-farsi.medi
, yeh-farsi.init
User expectations of ligature components:
sad-ar.medi
+yeh-farsi.fina
Compiler result:
Nothing!
Parsing name could go wrong to find ligature names. I can impose a naming scheme on the user to make the ligature work. But it's my expectations. There is no place in OT spec that says a ligature should have a certain name and why should I make a tool to impose that? I won't go on about the examples. Can we please establish our arguments on abstract data and OT spec? Can we please establish that a user might not structure the glyph data like the way we expect and the fact that there is an RTL
flag in OT spec for a reason?
I already had my doubts to post this proposal here as this could be seen as not backward compatible and more of a change than an improvement to the spec.
There is already a Kerning object in the Font where data is being stored that will be converted to OpenType features. But shouldn't OpenType features be attached to Glyph rather than the Font? Storing all the kerning in one object is inefficient and prone to other problems if fonts are merged and the character set is changed. It's not only kerning; there are other OT features that are not easy to exchange between fonts. Can you filter glyphs if their type is Base or Mark? Groups are also a glyph property because a glyph member should indicate which group they belong to and not the group. This will remove also implementation problems where they should decide if a glyph is deleted, then a group also should update its members. Here is my proposal which suggests Kerning and many the other OpenType features should be a Glyph property.
Some properties could be empty or non-existent like Carets. The reason GSUB and GPOS rules are separated is that it’s less likely they will be shared in one feature. Another advantage here is that we don’t need to come up with naming schemes for glyph names (e.g. glyph1_glyph2.case) to indicate their features which are very limited (considering contextual rules and shared lookups between features). It's easier to change the character set without having to change a huge feature file. At the end during the font generation, compiler gathers feature data glyph by glyph (could be cached if any glyphs is unchanged) and converts it to binary data. Of course details could be discussed further if you think this is worth discussing or you would accept pull requests regarding this.