Closed PeterConstable closed 4 years ago
I suspect what's happening is that the font was written assuming that GIDs are always uint16, including in BaseGlyphV1 record -- the pitfalls of changing from past conventions!
I think your suspicion is correct... 😅
In the COLRv1 table definition in fontTools.ttLib.tables.otData
I have indeed reused the GlyphID
type, which is a typedef of uint16
:
https://github.com/fonttools/fonttools/blob/26ac716a8d85c832cba2694dda994c03aafe3386/Lib/fontTools/ttLib/tables/otData.py#L1581
So yes.. the data and the spec don't match #8
I'm fine using 16bit glyph indices if we think we don't need future-proofing. But it occurs to me that if we use 32bit, there will be a pathway to add 32bit glyph indices to the font while keeping the 16bit part still working with older implementations. Is not clear how useful these kinds of "getting there incrementally" approaches are helpful versus just making an already messy situation even messier.
I'm not at all opposed to using 32-bit GIDs. We have other places in OT where that's already done. It's easy to point out, and it's easy to fix in code.
Btw, my remark about "pitfalls of changing from past conventions" was just observing that people will default to assuming what they have seen in the past, as happened here. For a case like this, it's easy to point it out, and it's easy to fix in code. I'd be more concerned about things that are less obvious and require bigger code changes to fix when done "the old familiar way" (e.g., where array counts are located).
I adapted my code to the font's use of 16-bit GID. I get farther, but end up with junk when I start trying to read the layer records array for the first base glyph. (I don't think 722 layer records is what was intended.)
My guess is that offsets are using the wrong base. In the first BGR, for gid 7, the layer records offset is 0x28. But that's still within the BGR array! (That assumes using start of COLR as the base for the offsets, which must be the case since the BGRs are records, not tables).
I managed to read the BGR array, but can't make any further sense of what was intended in the data in that font.
@PeterConstable have a look at this commented-out hex dump of a COLRv1 table that I made for unit testing the fonttools implementation: https://github.com/fonttools/fonttools/blob/26ac716a8d85c832cba2694dda994c03aafe3386/Tests/ttLib/tables/C_O_L_R_test.py#L68-L165
My guess is that offsets are using the wrong base
well, since it was sort of undefined until now, it may well be.
OK, it looks like the base you have used for the LayerV1 arrays is the start of the the BGRV1 array -- that is, the location of the count for the BGRV1 array.
To put that into the existing paradigms in the OT spec, that BGRV1 "array" would be a table.
Header
Type | Field name | Description |
---|---|---|
uint16 | version | set to 1 |
uint16 | numBaseGlyphRecords | may be 0 in a version 1 table |
Offset32 | baseGlyphRecordsOffset | offset to baseGlyphRecords array, from start of COLR table |
Offset32 | layerRecordsOffset | offset to layerRecords array, from start of COLR table |
uint16 | numLayerRecords | may be 0 in a version 1 table |
Offset32 | baseGlyphV1ListOffset | offset to baseGlyphV1List table, from start of the CPAL table |
Offset32 | itemVariationStoreOffset | offset to ItemVariationStore, from start of COLR table |
BaseGlyphV1List
Type | Field name | Description |
---|---|---|
uint32 | numBaseGlyphV1Records | |
BaseGlyphV1Record | baseGlyphV1Records[numBaseGlyphV1Records] |
BaseGlyphV1Record
Type | Field name | Description |
---|---|---|
uint16 | glyphID | (or could change spec to uint32) |
Offset32 | layerV1Offset | offset to LayersV1 table, from start of BaseGlyphsV1List table |
LayersV1
Type | Field name | Description |
---|---|---|
uint32 | numPaintRecords | |
PaintRecord | paintRecords[numPaintRecords] |
PaintRecord
Type | Field name | Description |
---|---|---|
uint16 | glyphID | (or could change spec to uint32) |
Offset32 | paintOffset | offset to Paint table, from start of LayersV1List table |
This is a structural model that exists already in OT. It's similar to ScriptList:
GSUB -> ScriptList ->> Script --> LangSys COLR -> BaseGlyphV1List ->> LayersV1 ->> Paint
I think that's a reasonable design that fits with the existing spec.
One note, though: The BaseGlyphV1List structure is fairly simple: a count and an array of fixed-size records. That could be incorporated directly into the COLRV1 table rather than adding an offset. In GPOS/GSUB, there are multiple trees like this — ScriptList, FeatureList, LookupList — and IIRC there aren't cases of a table having multiple variable-length arrays; hence those arrays are put in "List" tables referenced by offset. But for this case, either would work and would fit with existing patterns in the OT spec.
We still need to decide on 16-bit or 32-bit for GIDs.
I've created a doc that reflects the structures exemplified in the dump file from the fonttools unit test and that appears to be used in noto-handwriting-colr_1.ttf:
https://github.com/PeterConstable/OT_Drafts/blob/master/COLR_V1/COLRv1formats_rev1.md
I've created a report (in an Excel spreadsheet) from a VB-implemented parse of the COLR V1 table from noto-handwriting-colr_1.ttf, together with rev1 format spec here.
(I'll work on C# and Python parse implementations sometime next week.)
I wasn't handling the Affine2x2 in PaintFormat3 correctly. I've corrected that in my doc describing structures and in my Excel spreadsheet with the report of the parse of noto-handwriting-colr_1.ttf, both available here.
This has since been resolved. Closing.
I'm looking at the noto-handwriting-colr_1.ttf file here, and not making sense of the data I see in the COLR table.
That's not a valid glyph ID.
So, either I'm mis-interpreting the spec, or the data, or both; or else the data and the the spec don't match.
(I suspect what's happening is that the font was written assuming that GIDs are always uint16, including in BaseGlyphV1 record -- the pitfalls of changing from past conventions!)