Should variable composites be in the glyf table, and why?

skef commented 1 year ago

I think I understand how we got to the current variable composite proposal. Roughly:

The variable composites specification extends the current glyf composites mechanism.
Leaving variable composites in the glyf table saves some bytes, in that the offsets can remain in loca and you share the Tuple Variation Store offsets with gvar.

However:

Maybe the overall variable composites system shouldn't be so directly derived from the glyf mechanism (see #104)
Everything proposed would seem to apply just as well to pulling outlines out of a CFF2 table.
We already have a model for how to do this in an external table, that being COLR.

Right now, a system that understands COLR starts by looking in that table for an entry. If it finds one, it pulls path data from either glyf or CFF(2). If it doesn't, it falls back to glyf or CFF(2). All of this happens "below"/subsequent to shaping:

(shaping) -> COLR -> (glyf | CFF(2))

It seems like what "variable compositing" amounts to is an additional, simplified shaping step. Call it "intra-glyph shaping", which occurs here:

(inter-glyph shaping) -> COLR -> (intra-glyph shaping) -> (glyf | CFF2)

The only reason the system doesn't already look like this is that the compositing data is stored in the glyf table.

Set aside the question of other potential changes and just consider the current proposal: If one wanted to have this mechanism for CFF2 also, would it be substantially different? If it had to live inside the CFF2 table it would be formatted differently (with blends instead of a separate tuple variation store, perhaps using floats instead of fixed-point values of different scales, etc.) But would the meaning of the parameters be any different? Would other parameters be needed, or redundant, in the CFF2 case? I don't see how, or why.

So suppose the system worked this way instead:

Variable composite data is in its own table, call it "vcmp". It has some top-level mechanism for mapping data to GIDs analogous to that of COLR. The per-glyph tuple variation stores could be at an offset within the data.
For the sake of argument, leave the per-glyph format exactly like it is now, except for an additional hint flags field in the component record (and minus the stuff needed to play nice in the glyf table, like numberOfContours).
Prohibit the use of the existing glyf composite mechanism when using this separate table.
Specify that when there is path data for a GID in the (glyf | CFF(2)) table, and that GID also has a composite entry, the path data is added with no transformation to the composite data. (This was asked for toward the end of the TypeCon meeting.)
Specify that when there is hinting data for a GID in the (glyf | CFF(2)) table, (TrueType instructions or CFF stems) and that GID also has a composite entry, the relationship of the additional hinting data to the component hinting data is determined by the hint flags.

The main thing to work out with this system would be the details of the hint flags, but those problems are analogous for the two path data sources. Maybe you need different flags for glyf and for CFF2 — which could overlap, because one assumes mixing sources is off the table — but in each case the only thing to be worked out is how to reconcile the hinting data. (We know this because we already have COLR, so we already have implementations that grab data from the bottom-level tables, alter the points according to affine transformations, and render the results.)

This change would have these cons:

A modest increase in size, due redundant loca/gvar/vcmp offset entries and duplication across the tuple variation stores (header, regions).
?

And these pros:

Assuming someone does the work of specifying the hinting behavior for CFF2, the system would work just as well with CFF2 and glyf. This reduces pressure on glyf format changes. CFF2 already goes above 64k glyphs, already supports cubics, and can already losslessly represent quadratics as cubics (at the cost of using floating point values in the conversion, when that precision is needed).
If the composite system needs to do other things, its internal structure doesn't need to be so closely tied to the older glyf composite mechanism.

skef commented 1 year ago

Note: Although I can't make any promises, I've thought through some of what one would need to say about CFF2 hinting and variable components. It does seem like there could be a viable model here where overall hinting quality could approach that of the current system. ("Decompositing" to CFF (or CFF2) would involve some hinting compromises, but that's already true for CFF2 to CFF because of overlap.)

behdad commented 1 year ago

2. ?

Thanks skef.

Indeed a separate table was originally how VarComposites were proposed by @justvanrossum et al.

While this might architecturally look cleaner, I don't think in reality it is. I tried implementing this yesterday and found it extremely hard to decouple this table from the underlying glyph outline tables. So, from an implementation point of view, it's not like this can be an extra layer on top like you suggest.

I personally am of the opinion that this would better fit within glyf / CFF2 tables, as currently proposed as well as a new CharString operator. Since currently CFF does not have components, just subroutines, it's conceivable that the new operator would NOT invoke another glyph either, just change the variation coefficients for the subsequent blend operators. That might fit more naturally with how CFF works.

Note that when variable fonts were being implemented, it would have been possible to reuse the gvar table for CFF variations. But it was decided to go with inline variations in the CFF2 table. This situation seems analogous to me.

If such a proposed vcmp is to be added, it has to have its own gvar-like TupleVariationStore, since ItemVariationStore will be inefficient for this kind of storage. I find it cleaner to just use the existing gvar for glyf, and blend for CFF2 instead of an extra layer.

My .02 CAD. :)

skef commented 1 year ago

While this might architecturally look cleaner, I don't think in reality it is. I tried implementing this yesterday and found it extremely hard to decouple this table from the underlying glyph outline tables. So, from an implementation point of view, it's not like this can be an extra layer on top like you suggest.

Can you say more about this? Given that COLR is already supported on most platforms it's not clear to me why "vcmp" (or whatever) would be so different.

And also: there was a request at the TypeCon meeting for composite glyphs to have their own glyph data, and also to allow hinting instructions "overrides" in the composites. Would the need to support either or both of those change your answer?

Note that when variable fonts were being implemented, it would have been possible to reuse the gvar table for CFF variations. But it was decided to go with inline variations in the CFF2 table. This situation seems analogous to me.

I get the parallel, but the two situations seem distinct enough to me that things would weigh either way.

If such a proposed vcmp is to be added, it has to have its own gvar-like TupleVariationStore, since ItemVariationStore will be inefficient for this kind of storage. I find it cleaner to just use the existing gvar for glyf, and blend for CFF2 instead of an extra layer.

Aren't TupleVariationStores per-glyph? Wouldn't that just be a matter of an offset within the "vcmp" per-glyph subtable?

behdad commented 1 year ago

While this might architecturally look cleaner, I don't think in reality it is. I tried implementing this yesterday and found it extremely hard to decouple this table from the underlying glyph outline tables. So, from an implementation point of view, it's not like this can be an extra layer on top like you suggest.

Can you say more about this? Given that COLR is already supported on most platforms it's not clear to me why "vcmp" (or whatever) would be so different.

GSUB / COLR are easy to implement as extra layers. Basically:

"shape": Takes Unicode and font, returns positioned glyphs.
"paint": Takes glyph index and font, returns painting instructions that include painting the glyph outline.
"outline": Takes glyph index and font, returns the glyph outline from either glyf or CFF/CFF2.

Now, if we try to add a "varcomposites" layer before "outline", it's output will be glyph index and a normalized designspace location. So the outline extraction cannot use the existing font. It has to instantiate a new font. Moreover, the "outline" function should be replaced with "low-level outline" that extracts the outline from glyf/CFF2 only if we were to take your idea of mixing in the glyph outline from the same glyph index.

We also have a hurdle in HarfBuzz that we cannot modify the font in-place for new variations (breaks threadsafety of the font), nor our API allows for creating a new font at the new variations. So we have to add new internal API that accesses glyf/CFF2 directly with the requested variations without instantiating a new font (also important for performance).

If we were to implement this, I expect that we modify the GLYF/CFF2 tables to directly access and handle the varcomposites table, not the other way around. In short: it doesn't have the nice layering properties of COLR. It's certainly possible. Just doesn't look as clean as the design originally suggests.

And also: there was a request at the TypeCon meeting for composite glyphs to have their own glyph data, and also to allow hinting instructions "overrides" in the composites. Would the need to support either or both of those change your answer?

I understand the mixing of outline & composites comes naturally in your design. I considered that in the glyf table. It can be revived and done: a simple glyph has a size that is determined from the flags. Any data at the end can be used for varcomposites & extra hinting data if we choose to pursue this. Obviously it comes naturally with CFF.

Note that when variable fonts were being implemented, it would have been possible to reuse the gvar table for CFF variations. But it was decided to go with inline variations in the CFF2 table. This situation seems analogous to me.

I get the parallel, but the two situations seem distinct enough to me that things would weigh either way.

If such a proposed vcmp is to be added, it has to have its own gvar-like TupleVariationStore, since ItemVariationStore will be inefficient for this kind of storage. I find it cleaner to just use the existing gvar for glyf, and blend for CFF2 instead of an extra layer.

Aren't TupleVariationStores per-glyph? Wouldn't that just be a matter of an offset within the "vcmp" per-glyph subtable?

They mostly are. There's the shared tuples part that is shared at the table level. If we go down this route we may want to reconsider some gvar decisions as well, like a more compact axis-region storage. If we do that we may want to do the same in a new GVAR table.

skef commented 1 year ago

OK, I understand the implementation concerns. I would characterize them as "This would violate a number of layer separations in a HarfBuzz/FreeType stack in a way that COLR does not, which might suggest that it would do something similar in other implementation stacks." I would say that would be something to weigh against other factors, with input from those in charge of those other stacks, but isn't definitive on its own.

behdad commented 1 year ago

@nedley @PeterCon Your input is appreciated.

behdad commented 1 year ago

Let's work on a concrete table proposal. I'm willing to implement this and try.

behdad commented 1 year ago

Let's design this.

struct VARC
{
  uint16_t majorVersion; // == 1
  uint16_t minorVersion; // == 0
  Offset32To<ComponentList> componentsListOffset;
  Offset32To<gvar> gvarOffset; // may be Null
};

struct ComponenstList
{
  uint16_t format; // == 1
  Offset32To<Coverage> glyphCoverage;
  Offset32 dataStartOffset;
  uint32_t numData;
  Offset32 dataOffsets[numData + 1];
}

The per-glyph data will be a concatenation of records similar to the existing proposal:

https://github.com/harfbuzz/boring-expansion-spec/blob/main/glyf1-varComposites.md

Possibly adding a haveMore flag, such that any bytes at the end can be interpreted as TrueType hinting for the full composite. This is similar to how Composite glyphs currently are hinted.

Also, unlike your proposal, I think regular Composite glyphs in glyf table should also be allowed. They wont refer to this table, just to the glyf table itself.

behdad commented 1 year ago

I think the overhead of this approach, compared to the current proposal, is roughly 6 bytes per glyph. In the CJK font with 45,000 Kanji that I built, that translates to about 5% larger font.

UPDATE: possibly only 4 bytes for larger fonts. This is how I calculated:

If the font is large, then both loca, and gvar offsets are 32bit each. Assuming that most glyphs are VarComposites, in the VARC table we'll have 8 bytes of offsets (4 in dataOffsets and 4 in gvar), but the loca and gvar offsets drop to 16bit each. Total difference is 4 bytes per glyph.
For small fonts that both loca and gvar offsets are 16bit each, then the VARC would add 6 bytes per glyph. This also can be brought down to 4 if we add a flag to encode 16bit offsets in dataOffsets optionally.
Up to 8 bytes if there are enough non-VarComposites glyphs to keep loca and gvar offsets 32bit.

behdad commented 1 year ago

We can slightly optimize the variation storage by using a cvar-style, instead of gvar-style, TupleVariationStore. That is, store scalars instead of x,y variation values. I'm leaning towards keeping it simple and just use gvar though. Maybe can be prototyped and measured.

behdad commented 1 year ago

I really dislike all the duplication here though, of loca/gvar basically.

A middle approach would be to use the VARC table only for CFF2, the way that VORG is CFF-only.

justvanrossum commented 1 year ago

A middle approach would be to use the VARC table only for CFF2, the way that VORG is CFF-only.

Hmm, I really like that we can mix outlines and components. I would prefer a solution that doesn’t distinguish between glyf and cff.

skef commented 1 year ago

Also, unlike your proposal, I think regular Composite glyphs in glyf table should also be allowed. They wont refer to this table, just to the glyf table itself.

I was thinking of that as a simplification but I agree that just leaving things as they are might be preferable.

I think the overhead of this approach, compared to the current proposal, is roughly 6 bytes per glyph. In the CJK font with 45,000 Kanji that I built, that translates to about 5% larger font.

UPDATE: possibly only 4 bytes for larger fonts. This is how I calculated:

If the font is large, then both loca, and gvar offsets are 32bit each. Assuming that most glyphs are VarComposites, in the VARC table we'll have 8 bytes of offsets (4 in dataOffsets and 4 in gvar), but the loca and gvar offsets drop to 16bit each. Total difference is 4 bytes per glyph.

For small fonts that both loca and gvar offsets are 16bit each, then the VARC would add 6 bytes per glyph. This also can be brought down to 4 if we add a flag to encode 16bit offsets in dataOffsets optionally.

Up to 8 bytes if there are enough non-VarComposites glyphs to keep loca and gvar offsets 32bit.

Does no numberOfContours save another two bytes?

Possibly adding a haveMore flag, such that any bytes at the end can be interpreted as TrueType hinting for the full composite. This is similar to how Composite glyphs currently are hinted.

What haveMore signifies could easily be different for glyf and CFF, but I wouldn't be surprised if we wound up with one or three versioned subtables for special cases (or perhaps a new versioned "extra stuff subtable" rolling special cases together with their own offsets). In any case I'm not sure it would be necessary to burn the last flag for this purpose -- we could just decide to look for more based on whether the initial record is contiguous with offset[n] and offset[n+1].

We can slightly optimize the variation storage by using a cvar-style, instead of gvar-style, TupleVariationStore. That is, store scalars instead of x,y variation values. I'm leaning towards keeping it simple and just use gvar though. Maybe can be prototyped and measured.

If lowering the file size is the highest priority with this system, as it seems to be, maybe we should put more thought into this. We wouldn't need a sixteen bit tupleVariationCount, would we? Do we need it at all (can the count just be deduced)? Etc.

skef commented 1 year ago

(I think that part of the last bit of my last comment reflects my lack of detailed understanding of gvar. I'm trying to remedy that now so (with luck) I can be more useful to this thread ... )

behdad commented 1 year ago

I'm leaning towards reusing gvar unmodified, and leave optimizations to a newly designed version of gvar. The gvar with only scalars instead of x,y has some precedent in the cvar table.

behdad commented 1 year ago

Does no numberOfContours save another two bytes?

True. So the damage is 2 bytes only. That's nice.

skef commented 1 year ago

I think I'm a bit better on gvar now ...

It appears that for the most part there are two variable data "models" in current use in OpenType:

The tuple variation store and associated mapping for highly grouped data, such as glyf variations.
The item variation store for isolated data such as GPOS variations, with indexing for random access.

Either of these models are broadly compatible with our use case, but neither seems like a particularly good match for our expected data. Making a big list of major/minor pair offsets in VARC glyph entries would throw away any locality, so that seems pretty definitively poor. The TVS is closer but how relevant it is depends on the data patterns we're likely to see in practice. Will all data, or all data for a particular glyph, tend to have the same masters? Will every value tend to either be variable or not variable? When things are mixed up you'll pay a higher price encoding the point values.

@behdad et al have the advantage of having developed font data to look at, but even that might be specific to particular cases or methods of encoding and not entirely predictive.

Anyway, it seems to me that:

The clear disadvantage of an IVS approach is not the IVS itself but in the existing methods of indexing into the IVS.
That method (major/minor number pairs) isn't inherent to the system. You need to derive those numbers somehow, but you don't need to store the data that way. For example if there is a run of values using the same locations (or, really, regions), or with small enough variations that a few zero deltas don't matter much either way, you can put those into the same IVD in order and give them some version of a run-length encoding.

After thinking about this for a while, I'm trying to sketch out how an IVS-based VARC table might work. Obviously there would be an offset to the IVS itself in the table header. The only engineering question is how the mapping from the VCR values in the array to the major/minor numbers is encoded. The gvar point number mapping already provides some ideas in that area. I'm just going to try to come up with something pretty simple but still somewhat optimized (at least in the sense that it will work pretty well when many subsequent values use the same set of locations, and therefore the same IVD).

Then maybe people with access to more data might consider how it might perform in practice. I'm not sure but if the encoded mappings (value index to tuples in TVS, value index to major/minor number in IVS) wind up similarly sized, you might do a bit better with the IVS when it comes to header data (you can "share" that stuff with TVS but it takes a bit of header info to do it).

Anyway, I'll post here if and when I come up with something.

skef commented 1 year ago

OK, here is what I'm thinking for an Item Variation Store-based design:

Add a new offset to the IVS in the table header.
As an optimization, add a bit map to the variable component record indicating which axes and transformation parameters are variable.
Add a mapping immediately following the variable component array that assigns a major and minor number to each value among the set of variable components that varies.

Bit mask

I think it's very likely that the most common difference in variation model among axis values and transformation parameters is that some will be variable and some won't. So let's address that with a bitmap added after the gid field.

Let F be the number of flags set between bit 3 and 11 inclusive in variable component n. Call the field varMap[(F + numAxes + 7) / 8] in units of uint8 with notes "Bitmap of which values are variable."

Given this field, we can assign an entry index, in increasing order starting from 0, to each value in each Variable Component entry in the array for the glyph for which the corresponding varMap bit is set.

Mapping

The remaining problem is to map each entry index to an IVS major/minor number pair. How to do this efficiently depends on how we expect those entries to vary in "model" (the set of IVS regions used, corresponding to an Item Variation Data subtable) and to the extent of value duplication within a model.

This is all on the assumption that the most typical case will be that subsequent values are unique and use the same model (that model most often being one region for each master in the glyph). These can be stored succeeding item entries in one IVD subtable.

Then sometimes one value, or a string of values (perhaps in a particular variable component entry) will use a different model one needs to switch to temporarily.

And at other times a given value (more often within the same model) will already have an IVD item, but not in the current ordering, so one needs to switch to that and get back to the unique values.

These are just assumptions, of course, but they're not drastically different from the set of assumptions implicit in the Tuple Value Store architecture. (The TVS is somewhat more flexible when it comes to regions.)

So let's consider a simple byte-length operator-based mapping for this purpose.

Operators

00000000: Pick a new persistent major number with the next argument
11111111: Pick a temporary major number with the next argument
000?????: Increment the current minor number for the next ?????? entries.
001?????: Pick a new temporary minor number with the next argument and increment it for the next ?????? entries.
010?????: Pick a new persistent minor number with the next argument and increment it for the next ?????? entries.
011?????: Reserved
10??????: Multiply ?????? by the next uint8 argument and increment the current minor number for that number of entries.
11??????: Pick a new persistent minor number with the next argument, multiply ?????? by the following uint8_t argument and increment the current major number for that number of entries. (Excludes 11111111)

Argument sizes

Major number: Enough bytes to pick among Item Variation Data subtables (i.e. 1 if itemVariationDataCount <= 255, 2 if 255 < itemVariationDataCount < 65536, and so on).
Minor number: Enough bytes to pick among the delta sets of the current Item Variation Data subtable (i.e. 1 if itemCount <= 255, 2 if 255 < itemCount < 65536, and so on)
Multiplier: 1 byte

Temporary scopes

Scope of temporary major number: The next minor number operator
Scope of temporary minor number: The temporary minor number operator

Mapping constraints

The mapping for a glyph directly follows the variable components array
The number of entries in the mapping must be exactly the number of varMap bits set among the variable component entries in the array for that glyph. (Therefore the length of the mapping can be inferred by decoding it and stopping when that length is reached.)

Initial values

Both the initial major and the initial minor value for a glyph start at 0

Pseudocode

This is just for illustration purposes and is probably full of off-by-one errors and such:

entryCount = (total number of varMap bits set)
majorNumber = 0
minorNumber = 0
tmpMajorNumber = None
tmpMinorNumber = None
curEntry = 0
while curEntry < entryCount:
    instr = read(1)
    if instr =~ 0b0000000:
        tmpMajorNumber = read(bytesfor(itemVariationDataCount))
        continue
    else if instr =~ 0b11111111:
        majorNumber = read(bytesfor(itemVariationDataCount))
        continue

    if tmpMajorNumber is not None:
        thisMajorNumber = tmpMajorNumber
    else
        thisMajorNumber = majorNumber

    if instr =~ 0b000?????:
        count = ?????
    else if instr =~ 0b001?????:
        count = ?????
        tmpMinorNumber = read(bytesfor(itemCount of ivd at itemVariationDataOffsets[thisMajorNumber]))
    else if instr =~ 0b010?????:
        count = ?????
        minorNumber = read(bytesfor(itemCount of ivd at itemVariationDataOffsets[thisMajorNumber]))
    else if instr =~ 0b10??????:
        count = ?????? * read(1)
    else if instr =~ 0b11??????:
        count = ??????
        minorNumber = read(bytesfor(itemCount of ivd at itemVariationDataOffsets[thisMajorNumber]))
        count *= read(1)

    if tmpMinorNumber is not None:
        thisMinorNumber = tmpMinorNumber
    else:
        thisMinorNumber = minorNumber

    for i in range(count):
        itemMap[curEntry++] = (thisMajorNumber, thisMinorNumber++)

    if tmpMajorNumber is not None:
        tmpMajorNumber = None

    if tmpMinorNumber is not None:
        tmpMinorNumber = None
    else
        minorNumber += count

Examples

Major number 0 for all entries, minor number starting at 516 (out of less than 2^16) for 212 entries

Operator    Major  Minor   Mult
0b11100000         0x0204  0x06  (Pick new persistent minor number 516 and use it for 32 * 6 entries)
0b00010100                       (Increment current minor number 708 for next 20 entries)

Do the same up to entry 101, which uses major number 4 (out of less than 255) and minor number 28, then continue:

Operator    Major  Minor   Mult
0b11000010         0x0204  0x02  (Pick new persistent minor number 516 and use it for 50 * 2 entries)
0b11111111  0x04                 (New temporary major number 4)
0b00100001         0x001C        (Use new minor number 28 for one entry)
                                 (Major number back to 0, minor number back to 616)
0x10000010                 0x32  (Increment the current minor number for 2 * 50 entries)
0x00001011                       (Increment the current minor number for 11 entries)

skef commented 1 year ago

That was just a sketch, of course. There are probably better mappings and may be better overall mapping strategies.

Anyway, as we consider other TVS-based or TVS-like implementations we can consider the relative advantages of those vs an IVS-based system, with the main criterion presumably being how much space each uses.

behdad commented 1 year ago

That was just a sketch, of course. There are probably better mappings and may be better overall mapping strategies.

What we did in the COLRv1 is to store one uint32_t per item (can be glyph here, or record) and use consecutive numbers. This, however, would require a DeltaSetMapIndex to map the values most of the time, which itself adds bytes as well.

Anyway, as we consider other TVS-based or TVS-like implementations we can consider the relative advantages of those vs an IVS-based system, with the main criterion presumably being how much space each uses.

To me it's fairly clear that TVS is more optimized and better suited to this problem. Any reason you are pursuing the IVS approach so much? I think your sketch complicated things unnecessarily.

skef commented 1 year ago

To me it's fairly clear that TVS is more optimized and better suited to this problem. Any reason you are pursuing the IVS approach so much? I think your sketch complicated things unnecessarily.

It's not yet obvious to me that a TVS is more optimized in terms of disk space usage, depending on what patterns of parameter we're likely to see in practice. And based on the earlier discussion we're in this engineering realm where (for example) an average cost of six bytes per glyph could be reason to choose one solution over another, so presumably we want to be careful in our choices.

If I understand the current system correctly, each specified parameter is given an index, which are treated as point numbers in a TVS context. Suppose you have a font where this is the typical pattern (which doesn't seem very unusual to me):

Each variable component record specifies two axes, x and y translations, and a scale.
The value for one axis is variable and the other isn't, and the translations are variable but the scale isn't.

If I'm understanding the current TVS implementation correctly (which I still may not be), one would need to do one of two things to handle the values that aren't variable:

Specify 0 deltas for each of the tuple variations used for the other values (which we can assume will usually be the same)
Use the point number mechanism to skip over the entries for each of the tuple variations.

If the former, and the typical number of tuples/master locations is 8 + default, you would add 16 zero bytes per non-variable value you might otherwise avoid.

If the latter --- well, it gets complicated. Packed point numbers appear to be run-length encoded, so one can assume you'll need at least one byte to stop and resume the count (right)? And that will be per-tuple-variation?

The motivating idea of the TVS is that the per-glyph data will be quite uniform, which is a good assumption for outline data. For example you wouldn't expect to see points with all zero deltas frequently mixed in with other points, and run lengths will often correspond to every point in the glyph. This will also be true for composite glyphs in the current system, because transformation data doesn't currently vary, only the phantom point data varies (or can vary).

Therefore, what system is optimal in terms of disk space usage would depend on what data patterns will be typical, which in turn depends on how we expect the variable composite system to be used. It seems to me that there will be cases where a TVS is smaller and cases where an IVS is smaller.

Perhaps I'm just mistaken about all of this? I am pretty new to all the gvar stuff.

Lorp commented 1 year ago

If TVS is being revisited then it would be good to improve its handling of shared tuples for intermediate regions. Currently only the tuple’s peak may be shared. This means a font with an intermediate master wastes 4 * axisCount bytes per glyph by recording the same start and end values repeatedly.

A simple refinement, where the Reserved bit 0x1000 in tupleIndex is set to 1 if start and end are also shared, would fix this. Start tuple would be at masked tupleIndex+1 in shared tuples. End tuple would be at masked tupleIndex+2.

skef commented 1 year ago

In case we need some more flags, which wouldn't surprise me, something like the following could make sense:

Divide flags in to primary and secondary categories. Primary flags are specified as they are now, but the last primary bit becomes "has secondary flags". When that bit is set, there is an extra 16 bits of secondary flags immediately after the primary block.
Of the existing flags, the two skews become secondary.
I'm not sure we need flag 2. The default for Scale Y could be Scale X which you could then override with a 1 in cases where you need to.

behdad commented 12 months ago

In case we need some more flags, which wouldn't surprise me, something like the following could make sense:

Divide flags in to primary and secondary categories. Primary flags are specified as they are now, but the last primary bit becomes "has secondary flags". When that bit is set, there is an extra 16 bits of secondary flags immediately after the primary block.

Of the existing flags, the two skews become secondary.

I'm not sure we need flag 2. The default for Scale Y could be Scale X which you could then override with a 1 in cases where you need to.

I think this is complicating it too much. I agree flat 2 is unnecessary and should go. We'll just reserve the top bit for "more flags" and spec that if it's set and the implementation doesn't understand it, then processing should be stopped.

behdad commented 11 months ago

Further thoughts / investigation:

TupleVariationStore has an inherent 64kb limit on each glyph's data. Apparently @justvanrossum et al are already hitting that limit in my current design.
ItemVariationStore can be possibly competitive if we don't have to store the major/minor numbers all the time. The COLRv1 table does that by allocating adjacent minor numbers to items for a Paint. We can do something like that, but eg. including one VarIdxMap (DeltaSetIdxMap in the spec?!) that maps from glyph-number to starting major/minor. Or, we can include the starting major/minor per component. That way we can get more sharing of per-component data. I think I actually like this idea.

I'll prepare something based on it.

skef commented 11 months ago

I don't know how complicated we're interested in getting with an IVS mapping. Clearly the main advantage to be exploited over major/minor pairs for everything are sequential IVS entries (with the same major number). There will be times when you already have an individual entry "somewhere else" and you can do better (in terms of file size) by making an exception for it.

When I was thinking about this before I sketched out a micro-language that I thought would do pretty well in terms of minimizing storage. I'll reproduce it here in case it's useful.

(Note: I've gone back over the top portion of this but haven't reviewed the examples or pseudocode in detail before posting this.)

A basic design for supporting variable composites with an Item Variation Store:

The goal is to associate each parameter index with a major and minor
number in the IVS. We do this with the following system:

Starting major number (resets for new glyph): 0
Starting minor number (resets for new glyph): 0

1 byte operators:

00000000: Pick a new persistent major number with the next argument
11111111: Pick a temporary major number with the next argument

000?????: Increment the current minor number for the next ?????? entries.
001?????: Pick a new temporary minor number with the next argument and 
          increment it for the next ?????? entries.
010?????: Pick a new persistent minor number with the next argument and 
          increment it for the next ?????? entries.
011?????: Reserved
10??????: Multiply ?????? by the next uint8 argument and increment the current 
          minor number for that number of entries.  
11??????: Pick a new persistent minor number with the next argument, multiply ?????? 
          by the following uint8_t argument and increment the current major number 
          for that number of entries. (Excludes 11111111)

Argument sizes:

Major number: Enough bytes to pick among Item Variation Data subtables (i.e. 1
if itemVariationDataCount <= 255, 2 if 255 < itemVariationDataCount < 65536, and so on).

Minor number: Enough bytes to pick among the delta sets of the current Item Variation Data 
subtable (i.e. 1 if itemCount <= 255, 2 if 255 < itemCount < 65536, and so on).

Scope of temporary major number: The next minor number operator
Scope of temporary minor number: The temporary minor number operator

Constraints: 

The number of entries specified for a glyph must be exactly the number of (potentially) 
variable parameters across all the component entries for the glyph. (Edited to remove 
leftover mask stuff.)

Examples:

Major number 0 for all entries, minor number starting at 516 (out of less than 
2^16) for 212 entries

Operator    Major  Minor   Mult
0b11100000         0x0204  0x06  (Pick new persistent minor number 516 and use it for 32 * 6 entries)
0b00010100                       (Increment current minor number 708 for next 20 entries)

Do the same up to entry 101, which uses major number 4 (out of less than 255)
and minor number 28, then continue:

Operator    Major  Minor   Mult
0b11000010         0x0204  0x02  (Pick new persistent minor number 516 and use it for 50 * 2 entries)
0b11111111  0x04                 (New temporary major number 4)
0b00100001         0x001C        (Use new minor number 28 for one entry)
                                 (Major number back to 0, minor number back to 616)
0x10000010                 0x32  (Increment the current minor number for 2 * 50 entries)
0x00001011                       (Increment the current minor number for 11 entries)

Pseudo-code:

entryCount = (total number of varMap bits set)
majorNumber = 0
minorNumber = 0
tmpMajorNumber = None
tmpMinorNumber = None
curEntry = 0
while curEntry < entryCount:
    instr = read(1)
    if instr =~ 0b0000000:
        tmpMajorNumber = read(bytesfor(itemVariationDataCount))
        continue
    else if instr =~ 0b11111111:
        majorNumber = read(bytesfor(itemVariationDataCount))
        continue

    if tmpMajorNumber is not None:
        thisMajorNumber = tmpMajorNumber
    else
        thisMajorNumber = majorNumber

    if instr =~ 0b000?????:
        count = ?????
    else if instr =~ 0b001?????:
        count = ?????
        tmpMinorNumber = read(bytesfor(itemCount of ivd at itemVariationDataOffsets[thisMajorNumber]))
    else if instr =~ 0b010?????:
        count = ?????
        minorNumber = read(bytesfor(itemCount of ivd at itemVariationDataOffsets[thisMajorNumber]))
    else if instr =~ 0b10??????:
        count = ?????? * read(1)
    else if instr =~ 0b11??????:
        count = ??????
        minorNumber = read(bytesfor(itemCount of ivd at itemVariationDataOffsets[thisMajorNumber]))
        count *= read(1)

    if tmpMinorNumber is not None:
        thisMinorNumber = tmpMinorNumber
    else:
        thisMinorNumber = minorNumber

    for i in range(count):
        itemMap[curEntry++] = (thisMajorNumber, thisMinorNumber++)

    if tmpMajorNumber is not None:
        tmpMajorNumber = None

    if tmpMinorNumber is not None:
        tmpMinorNumber = None
    else
        minorNumber += count

skef commented 11 months ago

There are enough differences between a TVS and and IVS that comparing can be difficult, but one thing about the IVS is that given that you determine the mapping, you aren't required to have all the data for a given glyph be "adjacent". So the idea I was trying to work out with the write-up above is how to efficiently grab an individual delta-set, or short runs of delta-sets, if they're already defined for another glyph or glyphs, or are duplicated within a glyph. Given the mapping above you can do so at a cost of 3-7 bytes. So if 10% of delta-sets occur in more than one glyph that savings could be significant.

behdad commented 11 months ago

Thanks Skef. I'm not comfortable with that level of bit fiddling. Also, the number of datasets or the number of rows within a dataset of an IVS is its internal affair and should not affect parsing.

Give me a few days to think it more and see whether a new datastructure is justified.

behdad commented 11 months ago

Here's a new data-structure that is optimized for this use-case. It's similar to ItemVariationStore, but stores deltas for multiple values at the same time (with same major/minor). My proposal is that we add one major/minor for tranform fields of the VarComposite record, and another one for the axes variations. Both will be only available if a flag is set.

Design

The design has a few distinct properties:

It uses a sparse VariationRegionList, to save space in fonts with large number of axes. Also allows HOI by repeating the same axis. (This is a change we want to make to ItemVariationStore as well, somehow.
It uses a CFF2Index data-structure to efficiently pack variable-sized items for random-access.
It uses the same PackedDelta encoding from the gvar table, which is run-length encoded of 0, 1, or 2-byte entries. I don't think we need 4-byte entries in this structure.

MultiItemVariationStore

struct MultiItemVariationStore
{
  uint16    format;
  Offset32To<SparseVariationRegionList> sparseVariationRegionListOffset
  uint32    muiltiItemVariationDataCount;
  Offset32To<MultiItemVariationData> multiItemVariationDataOffsets[multiItemVariationDataCount]
};

struct SparseVariationRegionList
{
  uint32    regionCount;
  Offset32To<SparseVariationRegion> sparseVariationRegionOffsets[regionCount];
};

struct SparseVariationRegion
{
  uint16 numCoordinates;
  SparseRegionAxisCoordinates sparseRegionAxisCoordinates[numCoordinates];
};

struct SparseRegionAxisCoordinates
{
  uint16_t axisIndex;
  F2DOT14 startCoord;
  F2DOT14 peakCoord;
  F2DOT14 endCoord;
};

struct MultiItemVariationData
{
  uint8_t format; // = 0
  uint16 regionIndex;
  CFF2IndexOf<MuliItemVariationItem> multiItemVariationItems;
};

struct MultiItemVariationItem
{
  PackedDeltas[regionCount] deltas; // regionCount to be fetched using regionIndex
};

behdad commented 11 months ago

Then the design of VARC table would be:

struct VARC
{
  uint16_t major;
  uint16_t minor;
  Offset32To<Coverage> coverage;
  Offset32To<MultiItemVariationStore>
  CFF2IndexOf<VarCompositeGlyphRecord> glyphRecords;
};

struct VarCompositeGlyphRecord
{
  VarComponentGlyphRecord[] components;
};

anthrotype commented 11 months ago

The COLRv1 table does that by allocating adjacent minor numbers to items for a Paint. We can do something like that, but eg. including one VarIdxMap (DeltaSetIdxMap in the spec?!) that maps from glyph-number to starting major/minor. Or, we can include the starting major/minor per component. That way we can get more sharing of per-component data.

out of curiousity, did you end up exporing this idea? it appears to be the least intrusive

The problem is, the VarIdxMap ends up having to store major/minor for each item anyway, wasting data unnecessarily. It might result in more sharing, for sure, but my hunch is that it's still more data.

But yeah I'll probably prototype that first.

behdad commented 11 months ago

Variable Component Record

type	name	notes
uint16	flags	see below
uint8	numAxes	Number of axes to follow
GlyphID16 or GlyphID24	gid	This is a GlyphID16 if bit 0 of `flags` is clear, else GlyphID24
uint8 or uint16	axisIndices[numAxes]	This is a uint16 if bit 1 of `flags` is set, else a uint8
F2DOT14	axisValues[numAxes]	The axis value for each axis
uint32_t	axisValuesVarIdx	Optional, only present if bit 2 of `flags` is set
FWORD	TranslateX	Optional, only present if bit 3 of `flags` is set
FWORD	TranslateY	Optional, only present if bit 4 of `flags` is set
F4DOT12	Rotation	Optional, only present if bit 5 of `flags` is set
F6DOT10	ScaleX	Optional, only present if bit 6 of `flags` is set
F6DOT10	ScaleY	Optional, only present if bit 7 of `flags` is set
F4DOT12	SkewX	Optional, only present if bit 8 of `flags` is set
F4DOT12	SkewY	Optional, only present if bit 9 of `flags` is set
FWORD	TCenterX	Optional, only present if bit 10 of `flags` is set
FWORD	TCenterY	Optional, only present if bit 11 of `flags` is set
uint32_t	transformVarIdx	Optional, only present if bit 12 of `flags` is set

Variable Component Flags

bit number	meaning
0	gid is 24 bit
1	axis indices are shorts (clear = bytes, set = shorts)
2	axis values have variation
3	have TranslateX
4	have TranslateY
5	have Rotation
6	have ScaleX
7	have ScaleY
8	have SkewX
9	have SkewY
10	have TCenterX
11	have TCenterY
12	transformation fields have variation
13	reset unspecified axes
14	Use my metrics
15	Reserved. Set to 0

Variable Component Transformation

The transformation data consists of individual optional fields, which can be used to construct a transformation matrix.

Transformation fields:

name	default value
TranslateX	0
TranslateY	0
Rotation	0
ScaleX	1
ScaleY	ScaleX
SkewX	0
SkewY	0
TCenterX	0
TCenterY	0

The TCenterX and TCenterY values represent the “center of transformation”.

Details of how to build a transformation matrix, as pseudo-Python code:

# Using fontTools.misc.transform.Transform
t = Transform()  # Identity
t = t.translate(TranslateX + TCenterX, TranslateY + TCenterY)
t = t.rotate(Rotation * math.pi)
t = t.scale(ScaleX, ScaleY)
t = t.skew(-SkewX * math.pi, SkewY * math.pi)
t = t.translate(-TCenterX, -TCenterY)

behdad commented 11 months ago

But yeah I'll probably prototype that first.

Okay, with that, the table will be:

struct VARC
{
  uint16_t major;
  uint16_t minor;
  Offset32To<Coverage> coverage;
  Offset32To<DeltaSetIndexMap> indexMap; // May be null
  Offset32To<ItemVariationStore> varStore; // May be null
  Offset32To<CFF2IndexOf<VarCompositeGlyphRecord>> glyphRecords;
};

struct VarCompositeGlyphRecord
{
  VarComponentGlyphRecord[] components;
};

behdad commented 11 months ago

But yeah I'll probably prototype that first.

Okay, with that, the table will be:

struct VARC
{
  uint16_t major;
  uint16_t minor;
  Offset32To<Coverage> coverage;
  Offset32To<DeltaSetIndexMap> indexMap; // May be null
  Offset32To<ItemVariationStore> varStore; // May be null
  Offset32To<CFF2IndexOf<VarCompositeGlyphRecord>> glyphRecords;
};

struct VarCompositeGlyphRecord
{
  VarComponentGlyphRecord[] components;
};

I'm happy to put this in a proposal this week. But I won't have time to test it until later. cc @liamquin

Lorp commented 11 months ago

In the call 2023-12-12 there was discussion about recursiveness, and the view of @behdad and @skef (if I understood correctly, and rephrasing) was that there should be the following restrictions:

~A gid in a Variable Component Record ignores VARC.glyphCoverage and always points to the glyf table;~
A component gid in an old-style composite glyph ignores VARC.glyphCoverage and always points to the glyf table.

Since this leads to the possibility of a single gid producing valid yet different results in VARC.glyphCoverage and glyf, can we clarify in the spec what glyf-table glyph is expected in a font when gid is in VARC.glyphCoverage? Is it a COLR-style backup to handle VARC-less systems, often empty to save space when VARC-support is known?

Edited to strikethru incorrect claim.

behdad commented 11 months ago

A gid in a Variable Component Record ignores VARC.glyphCoverage and always points to the glyf table;

No no. It's the opposite. It recursively tries VARC first.

Lorp commented 11 months ago

Ok thanks for clarifying! But the second point about old-style composite glyphs, what happens if a component gid is found in VARC.glyphCoverage?

behdad commented 11 months ago

Ok thanks for clarifying! But the second point about old-style composite glyphs, what happens if a component gid is found in VARC.glyphCoverage?

Old-school components resolve within the glyf table.

Lorp commented 11 months ago

In normal use, for glyphs processed in VARC, the fontmaker is free to choose between an empty glyph or a backup glyph in the glyf table, right?

behdad commented 11 months ago

In normal use, for glyphs processed in VARC, the fontmaker is free to choose between an empty glyph or a backup glyph in the glyf table, right?

Correct. Empty glyph saves the glyph header as well, compared to the previous designs.

What we do lose is the glyf extents. But CFF doesn't have extents either...

behdad commented 11 months ago

Preliminary results are promising!

For a Hangul font I'm testing, with previous gvar approach: 1102048 bytes. With new VARC design with IVS and VarIdxMap: 1093900 bytes.

This is very early results. My code may be wrong. I don't have a renderer for it, etc etc.

behdad commented 11 months ago

Okay I did some more work. There's good news and bad news.

Good news: For Hangul:

gvar: 1.1MiB
VARC: 0.8MiB
flat: 5.7MiB
VARC / gvar: 73%
gvar / flat: 19%
VARC / flat: 14%

Bad news: For Hanzi:

gvar: 15.6MiB
VARC: 18.9MiB
flat: 41.2MiB
VARC / gvar: 122%
gvar / flat: 38%
VARC / flat: 46%

There's regression in the Hanzi case.

behdad commented 11 months ago

@justvanrossum

behdad commented 11 months ago

This is with: https://github.com/fonttools/fonttools/pull/3395 https://github.com/googlefonts/varc-rcjk/pull/1

behdad commented 11 months ago

The problem is, the VarIdxMap ends up having to store major/minor for each item anyway, wasting data unnecessarily. It might result in more sharing, for sure, but my hunch is that it's still more data.

The VarIdxMap is taking over 5MB :(.

skef commented 11 months ago

Wondering what percentage of the "shared" values in the Hanzi case are not actually variable (if any). I would expect this new approach to deal with alternating variable and non-variable parameters less well.

Also wondering about a cut-off between caching values in the ivs vs re-writing them based on the size of the model, but it seems like you're likely already looking at that.

justvanrossum commented 11 months ago

t = t.skew(-SkewX math.pi, SkewY math.pi)

Note that fontTools' DecomposedTransform.toTransform() does not change the sign of the first arg to skew. I believe that's the form we decided to keep.

behdad commented 11 months ago

t = t.skew(-SkewX math.pi, SkewY math.pi)

Note that fontTools' DecomposedTransform.toTransform() does not change the sign of the first arg to skew. I believe that's the form we decided to keep.

I'm confused. Okay that affects my implementation. But for the the spec I prefer to keep both counter-clockwise, like COLR table. Does that sound right?

behdad commented 11 months ago

Wondering what percentage of the "shared" values in the Hanzi case are not actually variable (if any). I would expect this new approach to deal with alternating variable and non-variable parameters less well.

In the current approach we're getting four bytes in the VarIdxMap for each value. That's killing it.

Also wondering about a cut-off between caching values in the ivs vs re-writing them based on the size of the model, but it seems like you're likely already looking at that.

Not sure what you mean. I also try reusing the VarIdxMap segment for each component that saved a bit, but not much.

justvanrossum commented 11 months ago

I'm confused. Okay that affects my implementation. But for the the spec I prefer to keep both counter-clockwise, like COLR table. Does that sound right?

Probably it's me who is confused. Here is a commit that references a comment:

https://github.com/fonttools/fonttools/commit/8e981a1b287f5594904b8cf319b21d4bbbbf9125

I think this is the case:

DecomposedTransform.toTransform() does not change the skewX sign
But indeed in glyph-1 you also defined it as negative compared to the fonttools object.

But for the the spec I prefer to keep both counter-clockwise, like COLR table. Does that sound right?

It does now, now that I've refreshed my understanding of glyf-1. (I also see it reflected in my fontra-compile experiment.)

harfbuzz / boring-expansion-spec