Closed cmyr closed 2 years ago
Hm. What would a shaper need if it must expect to make contact with hostile bytes?
IMO look to HarfBuzz for this type of thing except where @behdad advises otherwise
In the case of a tag or a version, it would likely just mean that various comparisons would fail.
In any case I think I have a solution that works for this, and is still zero-cost on the happy path.
In general in HarfBuzz we only reject things when structurally we have to.
Let me use the example of the tag you provided... Imagine you have a font file with an invalid tag in it.. but you can't even open it with your library to fix the font, because your library rejects the font file... That's what we don't want.
In other words, the library should be nonjudgemental.
I'm thinking about reading and representing the basic data types.
We have some sequence of bytes.
For many of these types (the integers) all possible byte sequences are valid.
For some of them, this is not true. For instance, a tag cannot contain the bytes
0x00..0x20
. Version16dot16 also has some funny constraints. Arguably the different 'offset' types cannot be zero (if we are representing the NULL offset asNone
, this is debatable).I think it's most useful to think about tag, since that is frequently used and has clear restrictions.
So: do we want to allow code to read a tag that contains invalid characters, and be defensive about this possibility (not assume a tag is valid utf-8 when printing, for instance) or do we want to check validity when reading, and return an error?
This is a simple example but I think this will end up being a larger philosophical question. I think it's probably worth writing some code that illustrates both possibilities, so we can see what they actually look/feel like, so I'll work on that.