Closed rsheeter closed 1 year ago
OK, some thoughts:
compileInterpolatableTTFs
) where we have all the UFOs at once as input and we expect a bunch of TTFs as output, so that's the spot to target. But each of the UFOs is fed individually to the outline compiler, which produces an entire independent TTF. (Currently with features, but we can split that out to a separate stage with my PR.)n
glif files and produces a glyf
table entry and a gvar
table entry. There is really nowhere simple to plug this into the existing ufo2ft code. Either you have a single UFO and you produce an entire glyf
table for that UFO as part of the process of producing a TTF, or you have a bunch of UFOs and you produce a bunch of TTFs. At no point do we have the granularity of working on n
glyphs in parallel, under the current architecture.compileInterpolatableTTFs
call out to some Rust code which precomputes all the glyf
information for all of the UFOs at once, and then uses that glyf
information to instantiate a mixed Rust/Python BaseOutlineCompiler
subclass, which in turn uses it to set up all the other tables which use the metrics etc.gvar
information in Rust and sending that back instead of doing the merge step.I left some comments in line. Overall I'm strongly in favour of this, and of doing this incrementally and in such a way that benefits existing consumer of fontmake.
There isn't such a thing as "the glyph loop" per se
Yes, we have several for
loops that involve glyphs and we'd like to break up/parallelize. One that isn't mentioned here directly are all the ufo2ft's "filters" that pre-process UFO glyphs (e.g. decompose components, convert cubic to quadratic curves, in addition to custom user-defined filters) before they get translated to truetype or CFF glyphs. Filters are also run serially at the moment: not just one filter at a time, the result of the previous filter being the input to the next, but also, within each filter, one glyph at a time.
It's true the current code is at the same time tightly coupled, while also spanning different projects (ufo2ft for master UFOs=>glyf compilation, fonttools varLib for master glyfs=>gvar), which makes integrating this proposed "oxidized glyph loop" more complicated than it appears.
One way to tackle this at first could be to keep the current serial python pipeline to produce only a scaffolding of the VF, with all glyphs blanked out (a ufo2ft filter could even literally nuke all the contours/components before proceeding as usual). Then (or even at the same time) also have this new Rust tool or method take a designspace with a bunch of UFO masters and produce whole glyf and gvar tables (internally parallelizing per glyph) and finally use fonttools to stick those in the VF at the end.
Or alternatively, which I think is what Simon suggested ealier, we could modify ufo2ft's compileInterpolatableTTFsFromDS method to call out into Rust to build the master TTFs' glyf tables, while keeping the rest of the code which depends on them (e.g. for the bounding boxes, sidebearings); then when ufo2ft goes on to call fontTools.varLib.build method, pass exclude=["gvar"]
parameter and subsequently call some other Rust API to build the latter and insert in the final VF.
I implemented what I'm calling "step one" of this (replace Python/C glyph table generation with Rust, not gvar yet) using this PyO3 module and this patch to ufo2ft.
I found a medium-sized speedup, around 10-15%. Which is not bad! Making code 10-15% faster is actually a pretty good optimization... it's just not what we were expecting.
Some observations:
.notdef
to fonts which don't have them, meaning that the on-disk UFO is not the same as the in-memory UFO. There are a few ways around this:
.notdef
in Rust. (I have code for this!).notdef
..notdef
here; it reminds us of the filters issue - that Python may manipulate the UFOs before compiling them, and we should probably have a better plan for that.Glyph
object rather than return the whole glyf
table as a big binary dump. This is because ufo2ft does further processing based on the contents of the glyf
table, and so if it was a big binary dump it would have to de-compile it to Python objects anyway.gvar
table. This isn't something that Python does further processing on, so it could just be jammed right into the font. However, since this happens deep inside of fontTools.varLib.merge
instead of ufo2ft, it's not immediately clear how to turn off specifically gvar
processing in the merger and do it ourselves.But I think the biggest observation is that we still don't really have a sense of how long each sub-task within font compilation takes. We're prodding at bits we think are going to be bottlenecks and optimizing them, but I don't feel like we know where the bottlenecks really are. I know @madig did a lot of profiling a year or so ago, but rather than function-level profiling, I think it is more useful to mark the start and end of discrete operations ("load in the UFOs", "convert the curves", etc.) and see what proportion of the build time they take.
10-15% total correct? Can you readily compare the specific operation that was ported to Rust vs it's Python equivalent?
Rendered view.
Holding in PR for a few days to facilitate collection of feedback.