unified-font-object / ufo-spec

The official Unified Font Object specification source files.
http://unifiedfontobject.org
171 stars 30 forks source link

[proposal] XML representation of feature format #106

Open simoncozens opened 4 years ago

simoncozens commented 4 years ago

Related:

I had a thought this morning about how features are represented. Currently we have a flat text file in AFDKO format, which has the advantage of being familiar and well supported with tools, but it has the disadvantage of not being particularly easy for editors to generate, manipulate, parse and reason about. If you want to programmatically copy a lookup from one feature to another, or between languages, or copy a feature between fonts, it's a pain in the head.

What I am suggesting is a new XML format which can be translated to and from AFDKO, and also to and from GSUB/GPOS representations of ttx. The format would remain "designer-centric", in terms of representing the rules at a high level, rather than replicating OTL data structures (i.e. starting with features and lookups, not GPOS/GSUB->script->language->feature). As with AFDKO, rule types would be implicit based on structure, rather than explicit. In other words it would be a half-way house between the textual AFDKO representation and the file-format-specific ttx.

Here is an example of how it might look:

<feature tag="liga">
  <!-- sub f i by f_i -->
  <sub>
    <match><glyph>f</glyph></match>
    <match><glyph>i</glyph></match>
    <replace><glyph>f_i</glyph></replace>
  </sub>
</feature>

<feature tag="locl">
  <script name="cyrl">
    <lang name="SRB">
      <sub>
        <match><glyph>be-cy</glyph></match>
        <replace><glyph>be-cy.SRB</glyph></replace>
      </sub>
    </lang>
  </script>
</feature>

<glyphclass name="ShaddaTashkil">
  <glyph>uni064E</glyph> <glyph>uni064B</glyph> <glyph>uni064C</glyph> <glyph>uni064F</glyph>
</glyphclass>

<lookup name="SmallTashkil">
  <!-- sub @ShaddaTashkil by @ShaddaTashkil.small; -->
  <sub>
    <match><glyphclass>ShaddaTashkil</glyphclass></match>
    <replace><glyphclass>ShaddaTashkil.small</glyphclass></replace>
  </sub>
</lookup>

<feature tag="calt">
  <!-- sub [uni0651 uni06EC] @ShaddaTashkil' lookup SmallTashkil; -->
  <sub>
    <match type="prefix"><glyph>uni0651</glyph><glyph>uni06EC</glyph></match>
    <match type="input"><glyphclass>ShaddaTashkil</glyphclass></match>
    <lookup name="SmallTashkil"/>
  </sub>
  <!-- sub @ShaddaTashkil' lookup SmallTashkil @ShaddaTashkil' lookup SmallTashkil -->
  <sub>
    <match type="input"><glyphclass>ShaddaTashkil</glyphclass></match>
    <lookup name="SmallTashkil"/>
    <match type="input"><glyphclass>ShaddaTashkil</glyphclass></match>
    <lookup name="SmallTashkil"/>
  </sub>
</feature>

If there's interest, I'm happy to write parsing code to translate to and from AFDKO.

justvanrossum commented 4 years ago

I think it's interesting to try this approach, but I would suggest to focus on the data structures and the implementation, not the format (XML), and make a working model that is either independent from UFO, or that can be stored as UFO lib items. Once it has been proven that this is a good approach forward, we can start considering developing a version of UFO that supports it.

simoncozens commented 4 years ago

Good point. I've started working on some code which translates between a TTFont (i.e. the on-disk representation), fontTool's feaLib AST (and from there to feature file format), and this proposed "designer friendly but machine readable" data structure. One nice side-effect of this is that you get an otf-to-fea converter for free. :-) Here is the data structure I'm working with at the moment:

adrientetar commented 4 years ago

Nice, I started looking into something like that for my font format before (OTL as data structure, I mean). Certainly I agree that existing formats to represent OTL aren't satisfactory.

Just one question, why do you need multiple match tags, for the f i ligature for example, can't you do:

<match>
  <glyph>f</glyph>
  <glyph>i</glyph>
</match>
mhosken commented 4 years ago

I agree that it may be easier to work with a bottom element of <glyphs>uni064E uni064B uni064C uni064F</glyphs> which is no harder to parse than a list of glyph elements and a whole lot easier to read. And of course could take a single glyph. Whether a glyphs element represents a sequence or a set is up to the context in which it is placed.

simoncozens commented 4 years ago

I haven't added much to this recently, but I have been working on it, and the fontFeatures library is now able to ingest different OpenType Layout formats into a simple internal representation. I'll add an XML input/output backend to that representation, and then we will have more to talk about.

alerque commented 4 years ago

@simoncozens Just in case you missed the link go by on twitter, there is a UFO spec meeting coming up.

simoncozens commented 4 years ago

Yep, I'm there, which is why this issue is suddenly live again. :-)

madig commented 4 years ago

(I'd like to add to the XML vs other formats thing: consider that XML requires custom de/serialization code someone has to write and maintain while you can e.g. ser/de JSON automatically with libraries like serde-json and https://pypi.org/project/cattrs/ -- those libraries make bring-up and maintenance so much easier!)

khaledhosny commented 4 years ago

Parsing JSON is a Minefield 💣.

simoncozens commented 4 years ago

I thought XML was part of the deal with UFO...

madig commented 4 years ago

JSON is just an example for a trivially ser/de-able format. Could be TOML or YAML or something like it, too. The same goes for Plist, which is why I think Just's proposal would be just as well 😉

justvanrossum commented 4 years ago

FWIW, when UFO was first developed, XML seemed "the thing to use". If json or yaml had been around, I bet we would have used it instead of the dreaded plist.

The dicts/lists/strings/numbers model is fantastic in its simplicity, and I wished newer formats like designspace had used it instead of creating yet another custom xml-based format.

alerque commented 4 years ago

JSON is a minefield. YAML is actually a superset of JSON and is even a bigger minefield. Of all the mentioned formats TOML is almost certainly hiding the fewest buried explosives. As @justvanrossum mentioned XML was conceived of and for many years billed as a panacea. In practice it's more of a pandora's box.

None of these are native anywhere, and all of them have libraries for dealing with de/serialization. Which ones are fastest / most robust varies a bit between languages. I'm disagree with @madig that JSON is somehow magic/automatic while XML requires custom code. I think that's a false dichotomy, all these formats will typically use some library for de/serializing the on-disk format into native language structures.

All that being said I personally love YAML and think it would be well suited for the purpose — but only if the entire spec called for it across the board. It even has a number of features such as includes and references that could be very useful to design a format around. But mixing and matching is the worst option. As long as the principle elements that make up a UFO file are XML encoded, all of them should be. If we want to talk about an all new format I'd propose skirting the minefield by defining a subset of allowed YAML features. But that's a bigger kettle of fish than this issue can support. For now as long as the principle format is XML based, extensions to it should keep following suit.

adrientetar commented 4 years ago

I'm disagree with @madig that JSON is somehow magic/automatic while XML requires custom code.

I think what he means to say is the mapping between UFO XML and the corresponding data structures isn't straightforward, there needs to be some parsing and rearranging of elements. Compare that with a serialized dump of the data structures themselves, which can be trivially de/serialized. A potential drawback of that approach is a change in the data structures necessarily implies a change in the file format.

simoncozens commented 4 years ago

Returning to the topic, after some experience with fontFeatures I think I know what a platform-independent OTL representation should look like; and serialising/deserialising is not really a relevant concern as you will always need to convert between the high-level representation I want to define and whatever structures your font editor uses to represent OTL internally. So I’m not sure this requires so much bikeshedding, and if UFO uses XML for everything else, let’s just use XML and not needlessly multiply parsers.

typoman commented 4 years ago

Making features using fea file syntax (which already has a parser and AST objects) is easier than using a new set of objects. I think any format that is going to store open type features related to GPOS, GSUB, and GDEF tables could address some major issues:

Readability: AFDKO feature file syntax is hard to read and diagnose when it comes to contextual rules. But it's already easier to read the AFDKO syntax file for most cases than to read an XML file. Even if someone writes a parser to convert the XML data to a feature file, the source XML is what the user would need to diagnose and not everyone can manipulate these data using scripting. Maybe another syntax is needed that would make the data model simpler compared to fonttools.fealib.ast objects instead of another XML data format (I've gone through FontFeatures but I don't know if has solved this for real). A limited or even a complex format could set designers back from being able to see the possibilities.

Transfering: We are entering an era that fonts are becoming larger and designers work separately on character sets as a part of a font family. Features can be written separately and then merged in one file or compiled separately as the final user might need different character sets. These fonts need to be merged or subsetted, may be diagnosed before shipping. How this new data model would facilitate this?

Tools: How a developer could create a tool that enables them to add rules to glyphs/font without concerning themselves with technicalities. How I can check if kerning pair or mark anchor with a certain context exist? How can I separate the kerning to different script sets or lookup flags without having to learn a complex library?

Feedback: There is a time gap between when a user makes an adjustment and when s/he will see a result depending on the complexity of the font. Compile-time matters when it comes to creating UI; Open type features can be sluggish to compile. I'm not sure this is what the format should be concerned with but still something to consider while trying to come up with a data model and compilers.

Sorry if this sounds demanding and I'm not asking anyone to solve these! I'm just sharing what concerns me with making open type features and I believe we need real examples that solve some of these issues before making a huge library.

khaledhosny commented 4 years ago

There are some issues with fea data in context of UFO:

A good exchange format IMHO should

I like feature files, I write them all day long, but it is an authoring syntax not an exchange format. An exchange format does not need to be writable by hand, that as know one designs glyphs by handwriting .glyf files.

I can’t currently do any complex OpenType fonts using UFO without heavy project specific customization and heavily tying it to a single tool, otherwise it can easily become a complete mess.

benkiel commented 4 years ago

I wanted to fill in the context of why the Adobe FEA syntax was chosen 16 years ago.

At the time there wasn't a fully featured, documented, feature syntax that had adoption and familiarly to type designers other than the Adobe FEA syntax. I believe @typesupply worked for a bit on something but couldn't come up with something that was as easy to write/use as FEA, so that was what was chosen for the UFO with the understanding that it wasn't perfect.

@khaledhosny enumerates well where it falls down —all 100% valid complaints and some good ground rules for an exchange format (I do like to edit .glyf files by hand, but yes, I don't draw outlines by writing xml). In the end, the features.fea file in the UFO is there because it is useful to type designers, especially as they work. Being able to write features to then quickly test is really handy. For production, my feeling is that most ignore it, when I do production, all my feature code is outside the UFO.

I would say that there is a conflict there: what type designers need in their workflow (a place to write features in a syntax most know well and are unafraid to write) and an interchange/production workflow. As this conversation continues, it's likely good to keep that in mind.

simoncozens commented 4 years ago

I would say that there is a conflict there: what type designers need in their workflow (a place to write features in a syntax most know well and are unafraid to write) and an interchange/production workflow.

I still think this is missing the point. The designer's view of their workflow is through their font editor, not through what is stored in the font file. I use Glyphs and I don't really care how features are represented in the .glyphs file; I don't tend to poke around inside it, because I don't need to. For a designer, that's the wrong layer of abstraction.

Maybe the font editor should expose features in FEA syntax, maybe it shouldn't. What we're deciding is what gets stored on the disk; how the designer sees and edits that is a client issue.

typesupply commented 4 years ago

@khaledhosny:

Interaction between few data and UFO is completely undefined. What happens when a UFO glyph gets renamed or removed? What happens when both UFO and fea define kerning, anchors, or any other OpenType data?

These are great questions and the spec very deliberately doesn't answer them. I realized that defining "if a glyph is removed from the glyph set, features.fea should..." would require parsing and representing .fea. As you stated, this is non-trivial so requiring any feature modification would set up an implementation barrier.

@simoncozens:

The designer's view of their workflow is through their font editor, not through what is stored in the font file.

Yes, but no, but yes, but no, but yes. 😄 I have very complex feelings on features.fea. I wrote the UFO Design Philosophy, so I'm pretty strict about sticking to it. features.fea breaks two of the three rules in ways that drive me bonkers. On the other hand, trying to force a replacement of a ubiquitous format was not pragmatic. I think I probably told Erik and Just "I'm having a real fight between theory and practice on this one. UGH." a million times back then.

Identifiers and some other things in UFO 3 showed me that trying to introduce new/changed behavior into editors through format changes is not always welcome. I followed the "If you build something better, they'll implement it." model and was very disappointed at the time. So, a new format would have to have to cross that hurdle. I'd like to see explicit rules for going to/from any other major formats (.fea, VOLT, ?). Ideally there would be usable code to do this. Losslessness is going to be a key detail in all of this for designers.

One thing that popped out of my memory late last night: back when I was trying to replace .fea circa 2004-9, I noticed that Adobe had patents on the conversion of high-level abstractions to GSUB/GPOS/GDEF. That made me nervous. I don't know if those are still applicable and I could also be very wrong about the existence of these since it was a loooong time ago. Perhaps a discussion with the friendly Adobe folks would clear that up.

After saying all of that, I want to emphasize that I'm very optimistic about what you are working on. I'd love to see this replace not just features.fea, but also kerning.plist and anything else that defines how glyphs interact with each other. Down with data duplication!


Taking off my spec editor hat and putting on my feature developer hat, I'd like to note some limitations of .fea that I'd love to see addressed:

justvanrossum commented 4 years ago

I think it's unrealistic to aim for a single format that 1. should be suitable for all, and 2. covers all of OTL. Small practical solutions in a subset of the problem space may allow us more progress than waiting for someone to come up with the ultimate grand design.

Like how kerning ("a table") is stored differently and separately from anchors ("belongs to glyph data"), yet both are used to ultimately compile to GPOS data. In what form data is stored is often informed by how authoring applications interact with it: "kerning as a table" is easier to interact with than a blob of .fea code hidden between other feature definitions.

There are different levels of abstractions that each have their place in people's workflows. There is no one size fits all. For many people, the single kerning file in UFO covers all their needs, for others it doesn't, as John Hudson explained so well during the meeting. For some people, .fea is a fine tool, for others it is horrible. Sometimes the best solution is to be as close to the metal as possible and use TTX. Sometimes a glyph naming convention is all you need to define some GSUB features.

Perhaps the problem is that .fea is presented as the way to store OTL data within UFO. It should be merely a way.

Perhaps our stance should be more extreme: I'm not convinced the UFO spec should contain a full definition for OTL-like features at all. UFO should facilitate a variety of workflows and data.

The idea of "mini specifications" could perhaps close this gap. For example, fontmake currently supports the use of MTI files to define features. Fonttools contains a compiler for it. Yet there is no official way for a UFO to say "use this MTI file for features". Likewise, it is undocumented how some tools use anchors to produce GPOS mark features. (With this philosophy, we could decide to demote .fea to a "mini specification": the official *.ufo/features.fea could be replaced with a file in *.ufo/data.)

In short: I would like to encourage people like Simon to focus on the things that .fea is bad at, and not worry about having to design something that can completely replace .fea.

simoncozens commented 4 years ago

The more I think about mini-specifications the less enthusiastic I am. It’s a deliberate fragmenting of the file format. With mini-specifications providing several options of representation, a font editor needing to read an arbitrary UFO needs to support all of the different flavours.

justvanrossum commented 4 years ago

The more people depend on UFO, the less likely we can come to a consensus as to how to move a monolithic format forward. I think the deliberate fragmentation is essential for progress.

A font editor only needs to serve its audience. It is not required so support everything.

To not have mini specifications encourages people to define private data and not document it (this has already happened and is a problem), and therefore may reinvent the wheel. With mini specifications people can build on each others experience, and community usage will prove which parts are succesful and which will die out. There will not be infinite ways to define OTL.

alerque commented 4 years ago

This is kind of the same debate as with Designspace in relation to where it fits into the puzzle. We talked about it in the meeting quite a bit (and see #86) but there doesn't seem to be a clear direction here yet. Indeed it seems to me some aspects of the theorized UFS format are already being put into UFO and the major points of disagreement about what to put in the spec would mostly better fit in the scope of UFS if it existed. How do all the extra attributes (features, kerning, axis interpolation, etc.,) that go into building a font family relate to the base collection of glyphs' outlines? Is UFO attempting to represent all the data that would be in a VCS repository needed to build a font? Or is it just a subset of that data –for example the shapes– meant to be matched with other bits and assembled into a greater whole?

Right now there seem to be aspects of both these approaches that have already made their way into both the spec & usage. I can see merit to multiple sides to the debate on what to do with features, and how you evaluate that debate (i.e. whether to define a single format for including feature data in UFO, to allow several possible formats via mini-specs, to keep feature data outside the UFO entirely and just include some kind of build description saying where the feature data should come from) seems to depend on where UFO is visualized in the bigger picture.

justvanrossum commented 4 years ago

The Grand De-unification of the Unified Font Object... This deserves a broader discussion and shouldn't be hidden in this issue about feature representation. I'd like to see a collection of formats that are designed to work together but are not tightly coupled to each other. "UFO" could be an umbrella for such a collection.

(The .glif subformat shows that this isn't exactly a new idea within UFO.)

simoncozens commented 4 years ago

This deserves a broader discussion and shouldn't be hidden in this issue about feature representation.

Well, true. But it does influence whether feature representation is one thing or many!

One of the major pain points of OpenType is that it is really two font formats pretending to be one font format. I think there is much to be learnt from that.

justvanrossum commented 4 years ago

Well, true. But it does influence whether feature representation is one thing or many!

For many users, .fea isn't going to go away soon as a part of their workflow — just like MTI and VOLT aren't going to go away for others — so whatever you will come up with will be an addition, no matter how officially it becomes part of the spec. If it turns out you've found the ultimate 100% solution, that would be great, and I'm sure it will replace all existing legacy things in an organic way. What UFO is or isn't shouldn't affect your work at all.

One of the major pain points of OpenType is that it is really two font formats pretending to be one font format. I think there is much to be learnt from that.

It's way more than two — SVG, various pixel formats, OTL being separate from outline formats, etc. etc. Sure, together that's an ugly mess, but without the individual parts and how they integrated with existing systems at the time there would be nothing. Things need to be able to grow organically, because you can't predict the future. In hindsight everything seems obvious how it should have be done instead, but hindsight is hindsight.

Let UFO be an environment where people can express ideas, and not be a monolithic specification of how things must be done.

simoncozens commented 4 years ago

Actually, I just had a shower and inspiration struck. :-)

The fact that we're being pulled in two different directions here suggests that there are two problems we need to solve:

  1. The need for an unambiguous representation of features that all editors can read, write and manipulate.
  2. The need to store a preferred representation that an editor "wants" to display to a user - typically fea, but maybe also VOLT, MTI, etc.

Mini-specifications tries to solve both of these problems with the same solution, but leads to fragmentation: now all editors need to implement all representations in order to read an arbitrary UFO. But if we solve them separately, we can avoid the fragmentation and avoid pushing onerous requirements on editor implementors. Instead:

  1. The features.plist (or whatever) file stores a neutral, XML-based representation of features. No comments, no friendly lookup names, just the feature data. I'll explain how this is specified later. This is the last resort for font editors presenting the feature data to users for editing. I would also say that it is the first resort for compiling the feature to binary, since it all the source parsing and feature building is already done.
  2. The editor may also stash the "source" feature format of their choice. That maintains order, comments, spacing, etc.
  3. If a font editor supports the feature format in a file - let's say FontLab loading a UFO containing Adobe FEA source - then it loads and stores that as well as the XML representation. (Yes, this is duplication, which of course is bad, but I would argue is the lesser evil here.)
  4. If the font editor does not support the feature format - FontLab loads a UFO containing MTI source - then it decompiles the XML representation back into whatever source format it wants to handle. This way it only needs to support two formats (the neutral format and its preferred format), and not whatever other mini-specifications are out there.
justvanrossum commented 4 years ago

I fundamentally disagree that "all editors" should support the superset of anything and everything, really. That has never been the case for UFO, and is so by design.

Your idea sounds clever, and perhaps it's a solution, but your format, whichever way it will turn out, will have to prove itself independently of UFO, and can be developed independently of UFO. Just like .fea has no fundamental ties to UFO.

justvanrossum commented 4 years ago

I will soften my previous comment somewhat, because there are obvious ufo connections that are necessary, such as: how will a new feature representation integrate with kerning tables and glyph anchors?

madig commented 4 years ago

The more I think about mini-specifications the less enthusiastic I am. It’s a deliberate fragmenting of the file format. With mini-specifications providing several options of representation, a font editor needing to read an arbitrary UFO needs to support all of the different flavours.

Keep in mind that that is the state of things today. The mini specs would just formalize it. The UFO format is so bare bones that people come up with their own solutions, which are by definition non-interoperable unless you teach every single other tool about them.

benkiel commented 4 years ago

To build on what @madig said: nothing is lost if a tool doesn't support a mini-spec; the data is preserved. Mini-specs are there to formalize the current informal situation, and to build consensus on things that work well and should be moved up into the full UFO spec. Much better to trial things and see how they work than to make a guess and then have to live with the decision if it turns out to either not work well or not be popular/useful.

madig commented 4 years ago

(To build² on that: everyone using custom data and no other tool knowing about it will also by definition lead to data de-synchronization and potentially loss. The same problem happens if you have UFO 3.1 with some more official lib keys and stuff and some apps not knowing what to do with them. We see this all the time with glyphsLib data that only means something to Glyphs and ends up in the wrong places when someone works on the font in FL5. I don't think anyone can win this one, unless consensus is eventually built and the custom data christened as public.)

benkiel commented 4 years ago

@madig by design, the spec says that things a tool doesn't know about should be left and not touched. Desynchronization is definitely possible, as something that doesn't know about a private key can't update it if things change, of course. So, the issue you're having is the tool not doing what it should be, as far as I understand

madig commented 4 years ago

The issue is more that every piece of added data that a tool must know about leads to an exponential amount of work on all tools to make them do the right thing. Take e.g. Glyphs master IDs. UFO doesn't know anything about a master IDs, so glyphsLib puts it in a UFO's lib. People then copy-paste the UFO in the file explorer with the lib inside to start a new master in another program, leaving the ID the same. Going back to Glyphs will then overwrite one master with the other and generate a support request, which I then have to deal with :) Arguably, glyphsLib needs a good whacking, but these are the kind of subtle issues you have to deal with with custom data. Having a format that does not need the custom data as a workaround to round-trip perfectly would alleviate the problem because there is only one official way to do things, and if any of you have been wondering, that was indeed the motivation behind a lot of change requests to the UFO format we filed in the name of Dalton Maag.

typesupply commented 4 years ago

Could we move this discussion to the appropriate issue, please? It'll be easier to track that way. 😄

justvanrossum commented 4 years ago

Custom data is not a workaround.

typoman commented 4 years ago

I also believe mini specifications is a great way to move forward. Not every solution is going to be perfect for everyone. I'm making my own feature format and I would like to know how can I make a specification for it that would make it possible to add it to UFO one day as a mini specification. Maybe a new issue with this topic on how a mini-specification should be written would be a great start.

benkiel commented 4 years ago

@typesupply is right, this is derailing feature format talk. Mini-spec is here: https://github.com/unified-font-object/ufo-spec/issues/118

mhosken commented 4 years ago

My understanding of the need is that we want a file format for behaviour description that lends itself to the following use cases:

My experience with FEA suggests that it would be possible to extend FEA to meet these needs.

I've done quite a bit of work with FEA for our more complex fonts. We at SIL, like many others, keep our behavioural description out of the UFO for a couple of reasons:

Details of how we have extended FEA can be found here. A simple example is here which simply references a few magic mark and base attachment classes. At the other end of complexity, there is an example here involving complex macros basically to get rid of constants due to sharing a single description across multiple fonts in a family. Of course we could have done it by having font specific include files.

I do not consider what we have done in extending FEA to be sufficient to meet the needs initially listed, but I do think it is a step in the right direction. There are a few additions that a font editor would probably want of FEA:

I can imagine something along the lines of:

if (capability("somecapability")) {
    somefunc(parameters);
} else {
    expanded compilable fea here
};

Which an editor can parse and identify which blocks of the fea go where and come from where. It also enables different tools to handle the fea according to their capabilities. This would require some standardisation of the capabilities and extension functions.

I realise this isn't mini specifications, or perhaps it is via a different route. But I suggest it might just work, with a whole lot of effort.

simoncozens commented 4 years ago

Hi Martin; this is a good idea, but I'm not sure it's the scope for what we're talking about in this issue. We will be having a discussion of alternative feature format syntax soon off the back of the UFO spec meeting - please subscribe to https://github.com/adobe-type-tools/afdko/issues/1202 and I will send out information about the meeting.

What I'm proposing here is a feature representation that is primarily machine manipulated, with the editor mediating that to the user.