googlefonts / fontations

Reading and writing font files
Apache License 2.0
376 stars 23 forks source link

Language agnostic schema #123

Open rsheeter opened 1 year ago

rsheeter commented 1 year ago

Once #84 is done we're getting close to the codegen input being language agnostic. It "just" needs to be a form other than Rust and to only have attributes that make sense beyond Rust. Strawman to incite debate: use https://toml.io/en/, it's simple, widely supported, supports comments, and more than sufficient to capture what we need.

Today

/// [COLR (Color)](https://learn.microsoft.com/en-us/typography/opentype/spec/colr#colr-header) table
table Colr {
    /// Table version number - set to 0 or 1.
    #[version]
    version: u16,
    /// Number of BaseGlyph records; may be 0 in a version 1 table.
    num_base_glyph_records: u16,
    /// Offset to baseGlyphRecords array (may be NULL).
    #[nullable]
    #[read_offset_with($num_base_glyph_records)]
    base_glyph_records_offset: Offset32<[BaseGlyph]>,

Tomorrow

tag = "COLR"
root = "Colr"

[table.Colr]
comment = "[COLR (Color)](https://learn.microsoft.com/en-us/typography/opentype/spec/colr#colr-header) table"

[table.Colr.version]
type = "u16"
attrib = ["version"]
comment = "Table version number - set to 0 or 1."

[table.Colr.num_base_glyph_records]
type = "u16"
comment = "Number of BaseGlyph records; may be 0 in a version 1 table."

[table.Colr.base_glyph_records_offset]
type = "Offset32<BaseGlyph[num_base_glyph_records]>"
attrib = ["nullable"]
comment = "Offset to baseGlyphRecords array (may be NULL)."
rsheeter commented 1 year ago

@dfrg notes it would be helpful if we explicitly captured what can/cannot be constructed (which is currently hidden/internal)

cmyr commented 1 year ago

This is a sketch for the general structure of a schema. The intent here is to figure out a structure capable of representing all of the things that we would like to know about a font table. It is written in a format-agnostic style; we would pick an actual format if we choose to implement this.

this is incomplete. The intention here is to show generally what this would look like, and I can persue it if there is consensus that this is a useful line of inquiry.

Note: the structure I've chosen here is ad-hoc and infinitely bike-sheddable; it can also be discussed if we decide to proceed.

Type

A Type is a string, which is one of either:

Table

A table object has the following fields:

field type required notes
name string yes the name of the table
sfnt tag Tag no the sfnt tag for this table, if it is top-level
short doc string yes a short description of this table
long doc string no additional information about this table
doc link string yes a link to online documentation for this table
input args [InputArgument] no only if this table requires external data to be parsed
formats [FormatTable] no a list of table formats. must not exist if 'fields' exists
fields [Field] no a list of fields. must not exist if 'formats' exists

InputArgument

An input argument is a name and a type.

field type required notes
name string yes the name of this argument, used in the containing table
type Type yes the type of the argument

FormatTable

A single format of a multi-format table.

field type required notes
format type Type yes the type of the format value, e.g. uint16
format int yes the format value. Must be valid for 'format type'
table Table yes the Table for this format.

Field

A field is a named value at a given position in a Table or Record.

field type required notes
name String yes the name of the field
type Type yes the type of the field
doc string yes a short description of this field
offset OffsetInfo no required if this field is an offset
count CountInfo no required if this field is an array or sequence

OffsetInfo

TK

CountInfo

CountInfo is additional information for computing the length of a sequence or array.

This has two parts. The first is the source of the count value, which is generally either the name of a sibling field or a literal. The second part identifies a possible transformation applied to this value.

field type required notes
value CountValue yes indicates the input value for computing the count
transform CountTransform no a token identifying a computation on the input value

CountValue

CountValue represents the source for the base input value used to compute the count.

field type required notes
field String no the name of a field or 'input arg'
literal int no a literal integer
all () no a flag indicating that sequence consumes the rest of the table's data

Exactly and only one of these fields must be present.

CountTransform

The count transform is an enum, serialized as an integer, with the following defined values:

name value function
MINUS_ONE 1 subtract 1 from the input
DIVIDE_BY_TWO 2 divide the input value by 2

unhandled: Device table delta values

Record

FlagSet

Enum

rsheeter commented 1 year ago

Awesome, ty. I like it, think it is valuable to pursue, and with my own biases fully intact think this would transform magnificently to something like toml :) I really want to try making a python reader off such a generic schema, I think that would be a very interesting exercise that might surface interesting things.

EDIT: at mild risk of overthinking things, maybe we could have an abnf. My immediate thought is a narrowing of https://github.com/toml-lang/toml/blob/main/toml.abnf.