googlefonts / oxidize

Notes on moving tools and libraries to Rust.
Apache License 2.0
173 stars 7 forks source link

Font table types #3

Closed cmyr closed 2 years ago

cmyr commented 2 years ago

Rendered

This is a proposal for a low-level crate to define read-only and mutable versions all the various structures defined in the various font tables. There was an earlier draft of this shared with @rsheeter but I've almost completely rewritten it since then, and I think I have a much better sense of the problem space.

Let me know if there's anything in here that is unclear, and of course if there is any additional feedback. Thanks to @simoncozens and @dfrg, this write-up is mostly just me smushing together and explaining existing work from pinot and fonttools-rs.

madig commented 2 years ago

I think using builders would be fine. fontTools has a builder module at https://github.com/fonttools/fonttools/blob/main/Lib/fontTools/fontBuilder.py and some tables like glyf and loca have to be generated, as noted, in tandem. ufo2ft can be roughly seen as a collection of builders, too (see https://github.com/googlefonts/ufo2ft/blob/32fa83c379402c8ac97f34a47f2d13edf11f87ab/Lib/ufo2ft/outlineCompiler.py#L118-L162). Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do. That'd help with making prototypes. I mean, you can also have Rust data types and builders.

I'd love to see the performance diff (if any!) when compiling a large bunch of UFOs like in https://github.com/madig/noto-amalgamated. I keep hearing the words "data-oriented programming" in my head but wouldn't know how it would look like in this case. Maybe it's not even relevant if 65k glyphs is the most you'll have to deal with!

behdad commented 2 years ago

Reading this discussion brings a lot of deja vu feelings, given that I've thought about this for a good 15+ years. I volunteered to Rod to make a presentation of how HarfBuzz handles these issues, where it fails, what pitfalls there are, etc. I can do it next week. I think a two-hour slot would be adequate. Your list of questions at the end make for a nice agenda.

cmyr commented 2 years ago

@behdad that would be invaluable.

dfrg commented 2 years ago

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

I’d love to see actual examples of this. My suspicion is that patching of this kind is generally not done in bulk or at high frequency like compilation and subsetting so is less sensitive to performance. I would also lean toward the builder model if this holds true.

behdad commented 2 years ago

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

POD means then you have to compile the whole table, which has the offset-overflow NP-hard problem. You lose everything as soon as you hit that problem. You can't "simply" move back and forth between the two representations.

behdad commented 2 years ago

BTW, I feel like allsorts is not being mentioned / studied enough here. https://github.com/yeslogic/allsorts https://github.com/yeslogic/fathom/blob/main/fathom/examples/formats/opentype.txt

behdad commented 2 years ago

Also RustyBuzz and ttf-parser: https://github.com/RazrFalcon/rustybuzz https://github.com/RazrFalcon/ttf-parser/blob/master/src/parser.rs

rsheeter commented 2 years ago

BTW, I feel like allsorts is not being mentioned / studied enough here.

+1, it's one of the most unique attacks on the problem from the little I know of it.

I think maybe there is a missing step 0: review existing related implementations and write some notes about each.

simoncozens commented 2 years ago

Agree, but with the proviso that implementations which only parse and do not generate TTF are limited in what they can tell us.

rsheeter commented 2 years ago

Allsorts claims to subset which implies a degree of generation capability?

madig commented 2 years ago

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

I’d love to see actual examples of this. My suspicion is that patching of this kind is generally not done in bulk or at high frequency like compilation and subsetting so is less sensitive to performance. I would also lean toward the builder model if this holds true.

Honestly, the scripts for direct binary font manipulation are things that run quickly enough in Python so far. Even if every single change in a Rust library would recompute the entire table, it would probably still be faster. Think adding or changing entries in the name table, taking out the kern feature from the DFLT script for a Deva font to stop Adobe apps from applying kerning twice, maybe adding the overlap bit to stuff in the glyf table, messing with vertical metrics in OS/2 and hhea, adding in a STAT table (and modifying the name table in the process), compressing to WOFF(2), that kind of stuff. For subsetting, we still use pyftsubset; one day I want to switch over to HarfBuzz.

POD means then you have to compile the whole table, which has the offset-overflow NP-hard problem. You lose everything as soon as you hit that problem. You can't "simply" move back and forth between the two representations.

Mh, yes. I suppose what I want to say is, it would be nice if you could take e.g. the name table, easily add something to it or change it and then write it back, that'd be nice. Maybe with a builder pattern where you could start with a given data and then go NameTableBuilder::from(data).addName(...) or something, but maybe that can easily be built on top of a more low-level read/write API without POD stuff.

dfrg commented 2 years ago

BTW, I feel like allsorts is not being mentioned / studied enough here.

+1, it's one of the most unique attacks on the problem from the little I know of it.

I think maybe there is a missing step 0: review existing related implementations and write some notes about each.

+1 for more comprehensive notes but I can give a quick rundown based on some familiarity with these projects.

Internally, ttf-parser is very similar to pinot with the exception that it favors sequential reads over using a lot of magic numbers as offsets. The public API is higher level and more similar to swash. It doesn’t provide direct access to parsed table data and doesn’t do any writing.

Allsorts does writing, but, notably, does not appear to subset the layout tables, nor does it seem capable of writing them aside from copying the blobs. The parsing is less adhoc than ttf-parser and pinot, but it is not zero copy. This project is really interesting as a shaper, but less so IMO as a reference for parsing/compiling.

rsheeter commented 2 years ago

For subsetting, we still use pyftsubset; one day I want to switch over to HarfBuzz.

Would recommend. Our experience is that it's (more than) two orders faster for (less than) an order less resources.

simoncozens commented 2 years ago

Oh: and even those projects which subset, like allsorts, last I checked did not do a thing about variable fonts. Some of the ItemVariationStore and TupleVariationStore subtables are extremely gnarly. (fonttools-rs both reads and writes them.)

dfrg commented 2 years ago

Thanks for the examples, Nikolaus. It seems like the smaller, more targeted transforms like flipping bits in the glyf table and disabling features (without removing them) would be good candidates for a supplemental library and probably shouldn’t influence overall design.

madig commented 2 years ago

For subsetting, we still use pyftsubset; one day I want to switch over to HarfBuzz.

Would recommend. Our experience is that it's (more than) two orders faster for (less than) an order less resources.

Yes, but I ran into crashers and CLI issues whenever I tried it a while back (all reported IIRC). Haven't checked recently, though.

supplemental library

Yes, I think that's fine and fontTools will continue to exist anyway :) I'm more interested in compiling my fonts on all my 16 cores simultaneously in under a second and inside the L1 cache.

cmyr commented 2 years ago

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

POD means then you have to compile the whole table, which has the offset-overflow NP-hard problem. You lose everything as soon as you hit that problem. You can't "simply" move back and forth between the two representations.

This sounds like a very important point. There are various other similar possible optimizations we won't be able to do if we're going the POD route (doing tricky things like using a single coverage table for multiple subtables?)

rsheeter commented 2 years ago

Yes, but I ran into crashers and CLI issues whenever I tried it a while back

If that is still true, particularly the crashes, @garretrieger will likely be interested :)

simoncozens commented 2 years ago

There are various other similar possible optimizations we won't be able to do if we're going the POD route (doing tricky things like using a single coverage table for multiple subtables?)

The opposite, I think. Consider the case where you're adding a glyph to the coverage of a positioning table. You'd better decompile and recompile the whole table, because otherwise, if that coverage table was previously shared with another lookup, you just changed the effect of that lookup too. And if it wasn't shared before, it may be sharable now.

garretrieger commented 2 years ago

Yes, but I ran into crashers and CLI issues whenever I tried it a while back

If that is still true, particularly the crashes, @garretrieger will likely be interested :)

The crashes you reported have all been fixed, and the overflow resolution has been significantly improved since then as well. As an added bonus the updated resolver can now pack complex GSUB/GPOS tables with lots of extension lookups smaller than fonttools can now.

dfrg commented 2 years ago

I’d like to propose that we make the read functionality core only, feature gate the writing and only take on the alloc dependency when that feature is enabled. This gives us a statically checked constraint that read is zero copy, smaller code size for consumer only scenarios, and simplifies compilation to WASM. I thought I’d bring it up early because trying to retrofit this can be painful.

rsheeter commented 2 years ago

read functionality core only, feature gate the writing

+1 though there is a limit to how much pain this is worth. Hopefully as long as it's introduced early not too much.

simplifies compilation to WASM

Is there a specific reason the writing side seems likely to kerplode in WASM? - wanting to write, maybe to subset for pdf, in wasm seems pretty reasonable.

rsheeter commented 2 years ago

@cmyr I think we could merge this whenever you feel ready. Comments are more refininement than massive change and we can continue to take PRs to update as we have new ideas or learn more.

dfrg commented 2 years ago

Is there a specific reason the writing side seems likely to kerplode in WASM? - wanting to write, maybe to subset for pdf, in wasm seems pretty reasonable.

Oh, I was unclear. Writing is perfectly fine and I think would be quite valuable if available as WASM. I'm trying to specifically avoid the std dependency which is what carries potential complications. The story here is getting better all the time, so it may not be an issue.

cmyr commented 2 years ago

I’d like to propose that we make the read functionality core only, feature gate the writing and only take on the alloc dependency when that feature is enabled. This gives us a statically checked constraint that read is zero copy, smaller code size for consumer only scenarios, and simplifies compilation to WASM. I thought I’d bring it up early because trying to retrofit this can be painful.

I think this makes sense, and I think trying to support the parse-only case as efficiently as possible is a good goal.

Also: I'm personally finding that the term 'zero-copy ' doesn't exactly sit right for me, for this project, and I want to make sure I'm not misunderstanding anything. To me, to be zero-copy we would have to be having all access of data in the font be the equivalent of pointer reads (indexing into a rust slice being functionally equivalent) but because of endianness we necessarily need to copy individual bytes in order to generate the scalars. So we do zero allocation, and we support random access (you can grab an element out of the middle of an array) but we do always need to copy at the leafs of the tree. Does this sound right to other people?

@cmyr I think we could merge this whenever you feel ready. Comments are more refininement than massive change and we can continue to take PRs to update as we have new ideas or learn more.

Sounds good, I'll do one more pass to incorporate any remaining feedback tomorrow but then I think this is a reasonable checkpoint.

dfrg commented 2 years ago

Also: I'm personally finding that the term 'zero-copy ' doesn't exactly sit right for me, for this project, and I want to make sure I'm not misunderstanding anything.

In principle, I agree with you. I believe the term originates in avoiding copying buffers between user and kernel space which is pretty far removed from what we're talking about. For our purposes, I think it's reasonable to define zero copy as "all scalar value loads generate a mov+bswap (AArch64: LDR+REV) sequence" which is the best we can do, but I'm happy with using more precise terminology if that's the general sentiment here.

By this definition, pinot (and swash) are not zero copy as they check bounds for almost every field access. We can definitely do better.

cmyr commented 2 years ago

That's a good goal, and we should try and think of ways we might achieve it without sacrificing safety. At the very least we could have unsafe _unchecked variants of various accessor functions, and require the consumer to do bounds checking up front. More ambitiously, I wonder if we might be able to use the type system to encode the validity of bounds in some cases? I'm thinking of some sort of token type that would be required to make certain method calls, and acquiring that token would involve checking some upper bound. This feels like it might be beyond the capacity of Rust's type system at the moment.

Certainly wherever possible we should encourage the use of iteration-based APIs, since these can trivially perform a single bounds check up front that is valid for all the individual accesses.

rsheeter commented 2 years ago

I think it's reasonable to define zero copy as "all scalar value loads generate a mov+bswap (AArch64: LDR+REV) sequence" which is the best we can do

+1 to crisply defining what we mean by zerocopy. @dfrg I propose you PR a definitions section, even if for now it's just zerocopy, to either the root README or a new md.

behdad commented 2 years ago

To me zerocopy simply means that if a font object is n bytes, we consume o(n) bytes to read from it. That is, we don't consume O(n) or more to process it.

behdad commented 2 years ago

To me zerocopy simply means that if a font object is n bytes, we consume o(n) bytes to read from it. That is, we don't consume O(n) or more to process it.

With that definition, neither endianness conversion nor bounds-checking violate zerocopyness.

dfrg commented 2 years ago

That's a good goal, and we should try and think of ways we might achieve it without sacrificing safety.

I gave up on the HarfBuzz style design when originally working on swash due to the safety issues, but I have some ideas about how to maintain safety now...

At the very least we could have unsafe _unchecked variants of various accessor functions, and require the consumer to do bounds checking up front. More ambitiously, I wonder if we might be able to use the type system to encode the validity of bounds in some cases? I'm thinking of some sort of token type that would be required to make certain method calls, and acquiring that token would involve checking some upper bound. This feels like it might be beyond the capacity of Rust's type system at the moment.

It's probably not completely representable in the type system, but I do see this pattern as being the solution. We'll need to carry some sort of context for dealing with offsets when walking a graph anyway and I believe this can also act as a safety token in some cases. For the most part, we should only need to bounds check when traversing an edge.

Certainly wherever possible we should encourage the use of iteration-based APIs, since these can trivially perform a single bounds check up front that is valid for all the individual accesses.

Agree completely on this.

dfrg commented 2 years ago

+1 to crisply defining what we mean by zerocopy. @dfrg I propose you PR a definitions section, even if for now it's just zerocopy, to either the root README or a new md.

Will do.

To me zerocopy simply means that if a font object is n bytes, we consume o(n) bytes to read from it. That is, we don't consume O(n) or more to process it.

I like this as a more general definition that gives us some breathing room wrt to bounds checking. I think this makes sense as the hard constraint with the load+swap case as desirable when possible.

cmyr commented 2 years ago

Okay, I'll merge this shortly.

Some last notes:

There are various other similar possible optimizations we won't be able to do if we're going the POD route (doing tricky things like using a single coverage table for multiple subtables?)

The opposite, I think. Consider the case where you're adding a glyph to the coverage of a positioning table. You'd better decompile and recompile the whole table, because otherwise, if that coverage table was previously shared with another lookup, you just changed the effect of that lookup too. And if it wasn't shared before, it may be sharable now.

I've been thinking about this a bunch, and I think these are both good points. In the first case, actual manual access to everything seems important, and then the second case feels more like a traditional compilation problem. Doing only the second approach means the first is impossible, but doing the first doesn't preclude also supporting the second.

also @dfrg just as a fun aside, the following two functions generate the same asm on x86 & aarch64:

pub unsafe fn manual_read_u16(buf: &[u8], offset: usize) -> u16 {
    #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
    return (*buf.get_unchecked(offset) as u16) << 8 | *buf.get_unchecked(offset + 1) as u16;

    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    (*(buf.as_ptr().add(offset) as *const u16)).swap_bytes()
}

pub unsafe fn boring_read_u16(buf: &[u8], offset: usize) -> u16 {
    u16::from_be_bytes(buf.get_unchecked(offset..offset + 2).try_into().unwrap())
}
behdad commented 2 years ago

also @dfrg just as a fun aside, the following two functions generate the same asm on x86 & aarch64:

pub unsafe fn manual_read_u16(buf: &[u8], offset: usize) -> u16 {
    #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
    return (*buf.get_unchecked(offset) as u16) << 8 | *buf.get_unchecked(offset + 1) as u16;

    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    (*(buf.as_ptr().add(offset) as *const u16)).swap_bytes()
}

pub unsafe fn boring_read_u16(buf: &[u8], offset: usize) -> u16 {
    u16::from_be_bytes(buf.get_unchecked(offset..offset + 2).try_into().unwrap())
}

Yeah we also found that compilers are smart enough to take care of such opitmizations.