TrueType instructions in UFO

moyogo commented 5 years ago

Could we add a lib keys to hold fontTools TTX instructions?

This would be like #42 but for TrueType instructions. It would be useful to have a standard way of storing bytecode TrueType instructions in UFOs, especially in the case of extracting UFOs from TTFs and being able to compile that back into TTFs. Or this could be useful when processing UFOs in a standard way with tools that can compile TrueType instructions.

It could use the following structure: In the glyph’s lib:

public.truetype.instructions as a dict:
- formatVersion string "1"
- id string Hash of glyph outlines (similar to public.postscript.hints id)
- instructionList string TTX assembly

In a font’s lib:

public.truetype.fontProgram string TTX assembly,
public.truetype.controlValuesProgram string TTX assembly,
public.truetype.controlValues as an array of integers.

moyogo commented 5 years ago

The following should also be allowed in the font’s lib:

public.truetype.maxStorage integer,
public.truetype.maxFunctionDefs integer,
public.truetype.maxInstructionDefs integer,
public.truetype.maxStackElements integer,
public.truetype.maxSizeOfInstructions integer.

typemytype commented 5 years ago

this is what RoboFont supports for a long time:

https://gitlab.com/typemytype/robofont_com/issues/12

the main idea was not to abstract binary data, but keep the data as close as to what fontTools requires. An authoring tool can generate, build this data while compiling a binary font. This is used by all hinting RoboFont tools.

benkiel commented 5 years ago

@moyogo could you write up a PR for this? I think having it spec'ed out clearly would help the discussion, and I agree that having a way to store TT hints in UFO is good. We've been waiting for VTT to be open sourced as a higher level abstraction, but perhaps too long now.

BoldMonday commented 5 years ago

I am wondering if a hash of the glyph order might be handy too? Certain TT instructions can refer to other glyphs in the font, and these references are defined by glyph index.

Therefore it is possible that certain TT instructions will not compile if the original glyph order has been changed.

moyogo commented 5 years ago

@BoldMonday The proposed id-hash uses the outline of components. Would that be enough?

moyogo commented 5 years ago

@benkiel I opened the merge request #94 with the lib.plist instructions-related keys in a single dict.

justvanrossum commented 5 years ago

Can glyph IDs be referenced in controlValueProgram or fontProgram?

moyogo commented 5 years ago

@BoldMonday What instruction do you have in mind?

benkiel commented 5 years ago

@moyogo Thank you for doing this! I have a feeling this will spend a bit of time in back and forth to get it well pinned down, but before it starts feeling like death by a thousand comments: thank you so much for putting this PR together.

@anthrotype @khaledhosny @behdad I'm sure you'll want to comment/review, as will @typesupply

jenskutilek commented 5 years ago

Is using a different format for TT assembly code out of scope for this discussion? I have recently been using htic to compile TT instructions. It uses a slightly different assembly text format, but conversion from FontTools-style assembly is trivial. The big advantage is that htic optimizes the code on compilation very effectively, on par with Visual TrueType's push optimization. An example:

o {
  SVTCA[X]
  MDAP[R] 40
  MDAP[R] 20
  SRP0 40
  MDRP[M>RGr] 0
  SRP0 20
  MDRP[M>RGr] 10
  SRP0 0
  MDRP[M>RGr] 30
  SRP0 10
  MDRP[M>RGr] 41
  SVTCA[Y]
  CALL 10 5 10
  CALL 10 15 6
  SRP0 5
  MIRP[M<RGr] 25 4
  SRP0 15
  MIRP[M<RGr] 35 4
  IUP[Y]
  IUP[X]
}

The original code looks like this:

<assembly>
  PUSHW[ ]  /* 1 value pushed */
  40
  MDAP[1]   /* MoveDirectAbsPt */
  PUSHW[ ]  /* 1 value pushed */
  20
  MDAP[1]   /* MoveDirectAbsPt */
  PUSHW[ ]  /* 1 value pushed */
  40
  SRP0[ ]   /* SetRefPoint0 */
  PUSHW[ ]  /* 1 value pushed */
  0
  MDRP[11100]   /* MoveDirectRelPt */
  PUSHW[ ]  /* 1 value pushed */
  20
  SRP0[ ]   /* SetRefPoint0 */
  PUSHW[ ]  /* 1 value pushed */
  10
  MDRP[11100]   /* MoveDirectRelPt */
  PUSHW[ ]  /* 1 value pushed */
  0
  SRP0[ ]   /* SetRefPoint0 */
  PUSHW[ ]  /* 1 value pushed */
  30
  MDRP[11100]   /* MoveDirectRelPt */
  PUSHW[ ]  /* 1 value pushed */
  10
  SRP0[ ]   /* SetRefPoint0 */
  PUSHW[ ]  /* 1 value pushed */
  41
  MDRP[11100]   /* MoveDirectRelPt */
  SVTCA[0]  /* SetFPVectorToAxis */
  PUSHW[ ]  /* 3 values pushed */
  5 10 10
  CALL[ ]   /* CallFunction */
  PUSHW[ ]  /* 3 values pushed */
  15 6 10
  CALL[ ]   /* CallFunction */
  PUSHW[ ]  /* 1 value pushed */
  5
  SRP0[ ]   /* SetRefPoint0 */
  PUSHW[ ]  /* 2 values pushed */
  25 4
  MIRP[10100]   /* MoveIndirectRelPt */
  PUSHW[ ]  /* 1 value pushed */
  15
  SRP0[ ]   /* SetRefPoint0 */
  PUSHW[ ]  /* 2 values pushed */
  35 4
  MIRP[10100]   /* MoveIndirectRelPt */
  IUP[0]    /* InterpolateUntPts */
  IUP[1]    /* InterpolateUntPts */
</assembly>

While htic's result is this:

<assembly>
  NPUSHB[ ] /* 22 values pushed */
  35 4 15 25 4 5 15 6 10 5 10 10 41 10 30 0 10 20 0 40 20 40
  SVTCA[1]  /* SetFPVectorToAxis */
  MDAP[1]   /* MoveDirectAbsPt */
  MDAP[1]   /* MoveDirectAbsPt */
  SRP0[ ]   /* SetRefPoint0 */
  MDRP[11100]   /* MoveDirectRelPt */
  SRP0[ ]   /* SetRefPoint0 */
  MDRP[11100]   /* MoveDirectRelPt */
  SRP0[ ]   /* SetRefPoint0 */
  MDRP[11100]   /* MoveDirectRelPt */
  SRP0[ ]   /* SetRefPoint0 */
  MDRP[11100]   /* MoveDirectRelPt */
  SVTCA[0]  /* SetFPVectorToAxis */
  CALL[ ]   /* CallFunction */
  CALL[ ]   /* CallFunction */
  SRP0[ ]   /* SetRefPoint0 */
  MIRP[10100]   /* MoveIndirectRelPt */
  SRP0[ ]   /* SetRefPoint0 */
  MIRP[10100]   /* MoveIndirectRelPt */
  IUP[0]    /* InterpolateUntPts */
  IUP[1]    /* InterpolateUntPts */
</assembly>

(For this specific example, you need to tell htic that function 10 leaves no value on the stack, or the push optimization will not optimize across function calls. In the worst case, the result is the same unoptimized code as before, but usually you get at least some optimization.)

jenskutilek commented 5 years ago

Here is a more complete htic code example.

I am currently wiring htic up to ufo2ft#234 so FontLab hinting code from UFOs generated by vfb2ufo will compile into TTFs and stay 100% rendering-compatible with fonts generated directly from FontLab 5. The part that translates FL5 code to htic code is not public (yet). I'd be happy to see wider support for this.

justvanrossum commented 5 years ago

I have recently been using htic to compile TT instructions.

I think this is a very interesting proposal. How stable is the htic language?

jenskutilek commented 5 years ago

You mean in terms of changes to the syntax? It seems quite stable. Since I'm aware of it (~ 1 year?) there has only been one change, the removal of sub-blocks which would probably not have affected disassembled code.

typesupply commented 5 years ago

My knowledge on this is extremely limited, so all I can offer are the general guidelines we used in the past for evaluating syntaxes:

Is the syntax well documented? Is the documentation ambiguous in any way?
Can a compiler be implemented easily without relying on a single code base or programming language? Does it rely on "special magic" in the code base that defines the syntax?
If the code base that defines the syntax fails for some unforeseen reason (abandonment, etc.), can the syntax continue to be viable?
Does the syntax support the full range of TT instructions? If not, how easy is it to extend in both practical and political terms?
Is the syntax owned by an entity? Will the license prohibit use of the syntax by anyone (ie can this be used in a commercial, closed source tool)? Will the license of the code prohibit use of the code base by anyone (ie can this be used in a commercial, closed source tool)?

I'm not saying that htic doesn't meet these requirements. In fact, I took a quick look at it and it seems extremely well organized and well documented, but, again, my knowledge on TT instructions is extremely limited. Just offering some evaluation advice...

BoldMonday commented 5 years ago

@moyogo @justvanrossum Recently I had an email exchange with Greg Hitchcock who mentioned some VTT instructions that deal with components. Those instructions use glyph IDs. Not sure if these specifics apply to VTT instructions only or to their native TT instructions as well. I will dig around more next week when I'm back from in the studio.

moyogo commented 5 years ago

@BoldMonday VTT code has instructions that do not translate into TrueType instructions, OFFSET for example uses glyph indeces but isn’t translated into TrueType instructions but rather allows to change the Composite Glyph description.

jenskutilek commented 5 years ago

IIRC only the "pseudo instruction" OFFSET[r|R] in composite glyphs uses glyph IDs.

/* The base glyph ID 61 is at offset 0, 0, no need to set the rounding flag */
OFFSET[r] 61, 0, 0
/* Set the "use my metrics" flag on the base glyph component */
USEMYMETRICS[]
/* A diacritic is added from glyph 542 with offset 128, 256; set the rounding flag */
OFFSET[R] 542, 128, 256

USEMYMETRICS[] is another pseudo instruction that influences the component flags.

Those have no equivalent in native TT instruction code. You need to set the component flags by some other means. I think most of the time a heuristic can be used (The first component with the same width as the composite glyph will get "use my metrics" set, any shifted components will get the "round" flag set, possibly only taking y-shift into account for modern y-hint-only fonts). FontLab sets the "round" flag on all components.

BoldMonday commented 5 years ago

These are the VTT instructions mentioned by Greg Hitchcock that use a glyph index as parameter:

OFFSET[]
SOFFSET[]
ANCHOR[]
SANCHOR[]

But if there are no native TT instructions that rely on glyph indexes then there is probably no necessity to check the integrity of the glyph order.

typesupply commented 5 years ago

Regarding referencing things by index: Could UFO have a slight-fork of the TTX or htic syntax that uses glyph names or identifiers (for points, components, etc.) in place of indexes? I know that the goal is to take an existing storage format and use it, but indexes are very un-UFO. The documentation could state that glyph names/identifiers would be replaced by the index at compile time.

jenskutilek commented 5 years ago

htic already supports identifiers for points (and cvts, zones). It expects them to be defined in the htic input file, but a compiler could gather them from the glyph outlines ahead of compilation.

jenskutilek commented 5 years ago

These are the VTT instructions mentioned by Greg Hitchcock that use a glyph index as parameter:
OFFSET[]
SOFFSET[]
ANCHOR[]
SANCHOR[]

These are not even mentioned in the VTT language reference or VTT manual … :-/ It's really hard to get an exhaustive knowledge of this stuff.

jenskutilek commented 5 years ago

If we want to use identifiers for zones and other values from the cvt, we would need a kind of "annotated" cvt format, not just an array of integers.

justvanrossum commented 5 years ago

In the considerations to use htic, how is VTT relevant?

jenskutilek commented 5 years ago

I think it was just to make sure we are missing no occurrence of glyph indices in TT code. Otherwise it is not relevant.

jenskutilek commented 5 years ago

Could we make id (the hash) in the glyph lib optional?

The calculation of the glyph hash in Adobe FDK looks rather complicated. It would make an initial implementation of the instruction processing easier if the id could just be skipped.

If we can use named points for instructions, the instruction code would be fairly stable. If only on-curve points are hinted in a glyph, the instructions could even survive outline conversion from cubic to quadratic.

jenskutilek commented 5 years ago

htic already supports identifiers for points (and cvts, zones). It expects them to be defined in the htic input file, but a compiler could gather them from the glyph outlines ahead of compilation.

Sorry, I was mistaken. htic does not support identifiers for points (only for functions, cvts, and instruction flags). So we do need strict outline checking before compilation.

I have taken a look at the AFDKO hash function. It is complicated and parses the UFO by itself. We should define a hashing function for glyph outlines to be used, preferably one with a simpler implementation than AFDKO.

khaledhosny commented 5 years ago

I have taken a look at the AFDKO hash function. It is complicated and parses the UFO by itself. We should define a hashing function for glyph outlines to be used, preferably one with a simpler implementation than AFDKO.

PSAutoHint has a HashPointPen which was written as a ufoLib replacement of AFDKO’s hashing function and should give the same hash.

jenskutilek commented 5 years ago

I have implemented compilation for htic TrueType code as outlined in #94 in my fork of ufo2ft. The instruction compiler is called automatically in ufo2ft’s post-processing step if you have htic installed.

You can try and compile the attached file with fontmake:

$ fontmake -u "IBM Plex Serif-Text.ufo" -o ttf --keep-overlaps --keep-direction --output-path "IBM Plex Serif-Text.ttf"

IBM Plex Serif-Text.ufo.zip

The only change from @moyogo’s proposal is that the controlValue entry doesn't use a list of integers, but a string of htic code.

A full htic file is additionally saved inside the UFO package as instructions.hti.

benkiel commented 5 years ago

Thinking out loud: would it be useful for UFO to store either the TTX dump or htic or flexibility? I can see how that could both be useful and a nightmare; @moyogo did you have any thoughts about htic?

jenskutilek commented 5 years ago

I forgot to mention: At the moment you also need my special version of htic for the compilation to work:

https://gitlab.com/jenskutilek/htic

And in the compiled example font there is a bug on the x-height at 18 ppm that I have to find and squash.

jenskutilek commented 5 years ago

Here's an updated version of the demo ufo: IBM Plex Serif-Text.ufo.zip Some function definitions were wrong which caused htic to do push optimization where it was not possible.

BTW by doing the demo implementation already, I'm not trying to push for htic, it's just to show that it could work, and I had done some of the work before this discussion came up.

I would also be happy if the TT assembly representation in UFO matched the FontTools representation. Compiling with optimizations can be done in a separate step if desired. Including both versions in the UFO spec is probably too much trouble.

moyogo commented 5 years ago

Sorry for not replying earlier. I’m fine with htic if it can do bytecode instructions round-tripping without optimization. Having the option to do optimization is nice but may have unwanted effects, so I wouldn’t want it all the time.

jenskutilek commented 5 years ago

The only thing that is currently keeping htic from simple roundtripping is the lack of support for the DELTAC1, DELTAC2, DELTAC3, DELTAP1, DELTAP2, DELTAP3 instructions. Only the "convenience instructions" deltac and deltap are supported. But that is an easy fix.

Do you need to preserve the push data sizes? Similar to the delta instructions, htic only has a generic push instruction that will choose the optimal instruction (PUSHB, PUSHW, NPUSHB, NPUSHW) at compile time.

madig commented 4 years ago

In https://github.com/daltonmaag/vttLib/, I just went away from storing the VTT data dump in the UFO data directory to storing them in external single blobs because I want to support interpolated instances and potentially variable fonts. Storing that data in UFOs makes little sense to me.

This proposal works when you compile a single UFO to a single static TTF in a specific way. Is there a story for instance or variable font generation as well or is that out-of-scope? At least the latter is going to be difficult because a variable font will probably have overlaps (and may use extra instructions), a static font probably won't, and you potentially need different hinting code for both.

moyogo commented 4 years ago

In the issue description I wrote the following:

It would be useful to have a standard way of storing bytecode TrueType instructions in UFOs, especially in the case of extracting UFOs from TTFs and being able to compile that back into TTFs. Or this could be useful when processing UFOs in a standard way with tools that can compile TrueType instructions.

Just as public.postscript.hints, the public.truetype.instructions would be ignored when the hash of glyph outlines doesn’t match anymore.

madig commented 4 years ago

Ok, so my concern is out of scope. :)

benkiel commented 4 years ago

Coming back to this, as it seems to have been a bit derailed by the talk of htic: is the limitations that @jenskutilek mentioned a concern? I think getting something into the format for this would be good.

benkiel commented 4 years ago

@moyogo and @jenskutilek: I'd like to get @moyogo's PR reviewed and merged: did you come to consensus on if htic should be used or not?

behdad commented 4 years ago

Yes please.

benkiel commented 4 years ago

Yes to htic or yes to getting this merged after resolving that?

behdad commented 4 years ago

To be merged. Sorry on phone. Thanks.

benkiel commented 4 years ago

@behdad I'm going to wait a bit to see what @moyogo and @jenskutilek say as it seems that they've been moving this forward a bit in other places.

From my read open concerns are

Can htic do bytecode instructions round-tripping without optimization
Are Glyph IDs used anywhere in the htic or TTX instructions (seems no, but I might be missing a confirmation)

behdad commented 4 years ago

Yes. I just meant I'm supportive of finishing and merging this. :)

jenskutilek commented 4 years ago

Can htic do bytecode instructions round-tripping without optimization

Not in its current state. It would require the addition of specialised PUSHB, PUSHW, NPUSHB, NPUSHWand DELTA[CP][123] instructions instead of the generic push and delta, and an option to not group push and delta instructions. There's already an issue on htic for the instructions. Avoiding the grouping is easy, I think I have added that to my fork. Fixing this shouldn't be too hard. I can investigate this.

Are Glyph IDs used anywhere in the htic or TTX instructions (seems no, but I might be missing a confirmation)

They aren't.

The other thing to consider is @typesupply’s suggestion to allow point identifiers instead of indexes. That would require changes in both TTX and htic. Implementing this in htic may involve more work, because as it is, the htic compiler doesn't know anything about the glyphs (and thus cannot look up any point identifiers). A hint authoring tool would have to provide a mapping of identifier to index to the compiler, similar to what is already possible for cvts:

cvt {
   32 zones_off # CVT 0
  500 x_height # CVT 1
  700 2 # index instead of label
}

Named points sound like a good idea, but I can't judge how useful they really are:

A hinting tool could use a mapping internally and write out point indices to htic code. Though htic code can be used directly, I would expect it is more likely to be used as an intermediate step between a high-level language and pure TT assembly code/bytecode. For roundtripping, it's hard to add point labels because it is not obvious which numbers in the decompiled bytecode correspond to point indexes.

Anyway, let's do this :)

typesupply commented 4 years ago

The other thing to consider is @typesupply’s suggestion to allow point identifiers instead of indexes.

If this is an impediment to implementing, stick with indexes for now. If outline sync becomes an issue, identifiers could be implemented in a version 2. Perhaps store a format version of the instructions should the need to modify the syntax arise.

jenskutilek commented 4 years ago

We may need to add component flags to UFO components (round, useMyMetrics, overlap). Though in most cases those can be set heuristically when compiling the font. Maybe that's also something for a future version? Or those flags could be added to the glyph lib instead of directly to the component.

benkiel commented 4 years ago

@jenskutilek how does glyph.components fall down here?

jenskutilek commented 4 years ago

Glyph component flags are part of the TrueType hinting, I'd argue. At least the "ROUND" flag has influence on the rasterization. To be able to roundtrip the flags, they need to be stored in the UFO somewhere. Currently there is no place to store them.

In ttx, they are stored as hex numbers:

<TTGlyph name="Aacute" xMin="16" yMin="0" xMax="673" yMax="957">
  <component glyphName="A" x="0" y="0" flags="0x200"/>
  <component glyphName="acute.case" x="45" y="0" flags="0x4"/>
</TTGlyph>

In UFO, something like this might be more elegant:

<?xml version='1.0' encoding='UTF-8'?>
<glyph name="Aacute" format="2">
  <advance width="689"/>
  <unicode hex="00C1"/>
  <outline>
    <component base="A" useMyMetrics="true" overlap="false"/>
    <component base="acute.case" xOffset="45" round="true"/>
  </outline>
</glyph>

Absence of the flags could mean that the compiler should deduce them.

typesupply commented 4 years ago

To be able to roundtrip the flags, they need to be stored in the UFO somewhere.

Hm. This could work well with another idea that I have been thinking about...

There has been some interest in adding a lib attribute at the API level in defcon and fontParts for objects that don't currently have a lib. contour.lib, component.lib, etc. Putting that on objects is no problem, but we'd need to store it in UFO and that is much more difficult. GLIF readers/writers are hardwired to the current GLIF schema and those will need to be updated to handle new elements and attributes. The structure of ufoLib.glifLib makes it pretty easy to do this, but the other readers/writers out there are out of our hands. I've been thinking that we could handle it this way until we feel comfortable making a major format change. In the GLIF <lib> element, establish some new public keys:

public.contour.lib
public.component.lib
public.etc.lib

Each of these keys would have a dict as their value. The dict keys would be the object identifiers and the values would be dicts.

At the ufoLib.glifLib level or even the defcon level, when reading GLIF these parts of the lib would be popped from the dict and set as the lib of the object with the matching identifier.

So, for this use case, the component flags would be located here in the defcon and fontParts API:

component.lib["public.htic"]["useMyMetrics"]
component.lib["public.htic"]["round"]
component.lib["public.htic"]["overlap"]

In the UFO the data would be stored in the GLIF like this:

<lib>
  <dict>
    <key>public.component.lib</key>
    <dict>
      <key>component1</key>
      <dict>
        <key>public.htic</key>
        <dict>
          <key>useMyMetrics</key>
          <true/>
          <key>overlap</key>
          <false/>
          <key>round</key>
          <true/>
        </dict>
      </dict>
    </dict>
  </dict>

This could be implemented pretty quickly and be backwards compatible without risk of data loss.

benkiel commented 4 years ago

This makes sense and agree it should be added (the .libs)

unified-font-object / ufo-spec

TrueType instructions in UFO #93