googlefonts / oxidize

Notes on moving tools and libraries to Rust.
Apache License 2.0
173 stars 7 forks source link

the care/correctness -> perf/ergonomics spectrum #5

Closed cmyr closed 2 years ago

cmyr commented 2 years ago

I'm thinking about reading and representing the basic data types.

We have some sequence of bytes.

For many of these types (the integers) all possible byte sequences are valid.

For some of them, this is not true. For instance, a tag cannot contain the bytes 0x00..0x20. Version16dot16 also has some funny constraints. Arguably the different 'offset' types cannot be zero (if we are representing the NULL offset as None, this is debatable).

I think it's most useful to think about tag, since that is frequently used and has clear restrictions.

So: do we want to allow code to read a tag that contains invalid characters, and be defensive about this possibility (not assume a tag is valid utf-8 when printing, for instance) or do we want to check validity when reading, and return an error?

This is a simple example but I think this will end up being a larger philosophical question. I think it's probably worth writing some code that illustrates both possibilities, so we can see what they actually look/feel like, so I'll work on that.

madig commented 2 years ago

Hm. What would a shaper need if it must expect to make contact with hostile bytes?

rsheeter commented 2 years ago

IMO look to HarfBuzz for this type of thing except where @behdad advises otherwise

cmyr commented 2 years ago

In the case of a tag or a version, it would likely just mean that various comparisons would fail.

In any case I think I have a solution that works for this, and is still zero-cost on the happy path.

behdad commented 2 years ago

In general in HarfBuzz we only reject things when structurally we have to.

Let me use the example of the tag you provided... Imagine you have a font file with an invalid tag in it.. but you can't even open it with your library to fix the font, because your library rejects the font file... That's what we don't want.

behdad commented 2 years ago

In other words, the library should be nonjudgemental.