googlefonts / colr-gradients-spec

63 stars 8 forks source link

Provide an annotated example of COLRv1 data with corresponding test font #18

Closed PeterConstable closed 4 years ago

PeterConstable commented 4 years ago

I'm looking at the noto-handwriting-colr_1.ttf file here, and not making sense of the data I see in the COLR table.

offset field (type) data
Header
2d60 version (uint16) 0x0001
2d62 numBaseGlyphRecords (uint16) 0x0000
2d64 baseGlyphRecordsOffset (Offset32 0x0000_0000
2d68 layerRecordsOffset (Offset32) 0x0000_0000
2d6c numLayerRecords (uint16) 0x0000
2d6e baseGlyphsV1Offset (Offset32) 0x0000_0016
2d72 varStoreOffset (Offset32) 0x0000_0000
-- -- --
baseGlyphsV1Array
2d76 numBaseGlyphV1Records (uint32) 0x0000_0006
baseGlyphsV1Array[0]
2d7a gID (uint32) 0x0007_0000

That's not a valid glyph ID.

So, either I'm mis-interpreting the spec, or the data, or both; or else the data and the the spec don't match.

(I suspect what's happening is that the font was written assuming that GIDs are always uint16, including in BaseGlyphV1 record -- the pitfalls of changing from past conventions!)

anthrotype commented 4 years ago

I suspect what's happening is that the font was written assuming that GIDs are always uint16, including in BaseGlyphV1 record -- the pitfalls of changing from past conventions!

I think your suspicion is correct... 😅

In the COLRv1 table definition in fontTools.ttLib.tables.otData I have indeed reused the GlyphID type, which is a typedef of uint16: https://github.com/fonttools/fonttools/blob/26ac716a8d85c832cba2694dda994c03aafe3386/Lib/fontTools/ttLib/tables/otData.py#L1581

So yes.. the data and the spec don't match #8

behdad commented 4 years ago

I'm fine using 16bit glyph indices if we think we don't need future-proofing. But it occurs to me that if we use 32bit, there will be a pathway to add 32bit glyph indices to the font while keeping the 16bit part still working with older implementations. Is not clear how useful these kinds of "getting there incrementally" approaches are helpful versus just making an already messy situation even messier.

PeterConstable commented 4 years ago

I'm not at all opposed to using 32-bit GIDs. We have other places in OT where that's already done. It's easy to point out, and it's easy to fix in code.

Btw, my remark about "pitfalls of changing from past conventions" was just observing that people will default to assuming what they have seen in the past, as happened here. For a case like this, it's easy to point it out, and it's easy to fix in code. I'd be more concerned about things that are less obvious and require bigger code changes to fix when done "the old familiar way" (e.g., where array counts are located).

PeterConstable commented 4 years ago

I adapted my code to the font's use of 16-bit GID. I get farther, but end up with junk when I start trying to read the layer records array for the first base glyph. (I don't think 722 layer records is what was intended.)

My guess is that offsets are using the wrong base. In the first BGR, for gid 7, the layer records offset is 0x28. But that's still within the BGR array! (That assumes using start of COLR as the base for the offsets, which must be the case since the BGRs are records, not tables).

I managed to read the BGR array, but can't make any further sense of what was intended in the data in that font.

anthrotype commented 4 years ago

@PeterConstable have a look at this commented-out hex dump of a COLRv1 table that I made for unit testing the fonttools implementation: https://github.com/fonttools/fonttools/blob/26ac716a8d85c832cba2694dda994c03aafe3386/Tests/ttLib/tables/C_O_L_R_test.py#L68-L165

My guess is that offsets are using the wrong base

well, since it was sort of undefined until now, it may well be.

PeterConstable commented 4 years ago

OK, it looks like the base you have used for the LayerV1 arrays is the start of the the BGRV1 array -- that is, the location of the count for the BGRV1 array.

To put that into the existing paradigms in the OT spec, that BGRV1 "array" would be a table.

Header

Type Field name Description
uint16 version set to 1
uint16 numBaseGlyphRecords may be 0 in a version 1 table
Offset32 baseGlyphRecordsOffset offset to baseGlyphRecords array, from start of COLR table
Offset32 layerRecordsOffset offset to layerRecords array, from start of COLR table
uint16 numLayerRecords may be 0 in a version 1 table
Offset32 baseGlyphV1ListOffset offset to baseGlyphV1List table, from start of the CPAL table
Offset32 itemVariationStoreOffset offset to ItemVariationStore, from start of COLR table

BaseGlyphV1List

Type Field name Description
uint32 numBaseGlyphV1Records
BaseGlyphV1Record baseGlyphV1Records[numBaseGlyphV1Records]

BaseGlyphV1Record

Type Field name Description
uint16 glyphID (or could change spec to uint32)
Offset32 layerV1Offset offset to LayersV1 table, from start of BaseGlyphsV1List table

LayersV1

Type Field name Description
uint32 numPaintRecords
PaintRecord paintRecords[numPaintRecords]

PaintRecord

Type Field name Description
uint16 glyphID (or could change spec to uint32)
Offset32 paintOffset offset to Paint table, from start of LayersV1List table


This is a structural model that exists already in OT. It's similar to ScriptList:

GSUB -> ScriptList ->> Script --> LangSys COLR -> BaseGlyphV1List ->> LayersV1 ->> Paint

I think that's a reasonable design that fits with the existing spec.

One note, though: The BaseGlyphV1List structure is fairly simple: a count and an array of fixed-size records. That could be incorporated directly into the COLRV1 table rather than adding an offset. In GPOS/GSUB, there are multiple trees like this — ScriptList, FeatureList, LookupList — and IIRC there aren't cases of a table having multiple variable-length arrays; hence those arrays are put in "List" tables referenced by offset. But for this case, either would work and would fit with existing patterns in the OT spec.

We still need to decide on 16-bit or 32-bit for GIDs.

PeterConstable commented 4 years ago

I've created a doc that reflects the structures exemplified in the dump file from the fonttools unit test and that appears to be used in noto-handwriting-colr_1.ttf:

https://github.com/PeterConstable/OT_Drafts/blob/master/COLR_V1/COLRv1formats_rev1.md

PeterConstable commented 4 years ago

I've created a report (in an Excel spreadsheet) from a VB-implemented parse of the COLR V1 table from noto-handwriting-colr_1.ttf, together with rev1 format spec here.

(I'll work on C# and Python parse implementations sometime next week.)

PeterConstable commented 4 years ago

I wasn't handling the Affine2x2 in PaintFormat3 correctly. I've corrected that in my doc describing structures and in my Excel spreadsheet with the report of the parse of noto-handwriting-colr_1.ttf, both available here.

PeterConstable commented 4 years ago

This has since been resolved. Closing.