Gain a better sense of the sub-operations involved in font compilation and their timing

simoncozens commented 2 years ago

Issue #25 gave us an overall picture of the distribution of compilation times, but in order to identify the most profitable spots for optimisation, we need a profile of font compilation based on the distinct operations involved:

How much of the build is taken up with Glyphs-to-UFO conversion?
How long does cubic-to-quadratic curve conversion take?
What percentage of build time is the binary font merge?
What percentage of that is gvar generation and optimization?
...
And how do all of these timings vary by glyph count and master count?

We've had attempts in the past to profile fontmake, but this has generally happened on the function level rather than on the macro "operation" level. ("Just throw a profiler at it".) This is relatively difficult to interpret as information gets lost in the weeds. I want a high level report which looks more like this:

Glyphs to UFO conversion: 30s (6%)
...

There is some timing code in fontmake (but not in ufo2ft), so it's a matter of expanding that, adding semantic information ("What is operation is this code performing?"), and then running it on our Noto test rig and collating the data.

simoncozens commented 2 years ago

Here's some data, for building variable fonts: https://gist.github.com/simoncozens/3b841c88d6cb7de813b7530759e25e44

I'm finding it hard to visualise; here's the Noto fonts with the top eight longest build times: Rplot03

library(ggplot2);
library(jsonlite);
library(dplyr);
library(RColorBrewer);
library(stringr);
df <- jsonlite::fromJSON("noto-variable.json");
df <- df %>% unnest(timings) %>% mutate(message=timings[,1], time=as.numeric(timings[,2])) %>% select(-timings)
df2 <- df %>% group_by(name) %>% mutate(total_time=sum(time)) %>% filter(total_time > 40 & total_time < 200)

# Timings as absolute seconds
ggplot(df2, aes(y=time, x=name, label=if_else(time>1,stringr::str_wrap(paste(message, " (", signif(time, digits=3), "s)",sep=""), 30), NULL),color=format)) + geom_bar(stat="identity", fill="transparent",size=0.1, color="black") + geom_text(size = 2.5,position = position_stack(vjust = 0.5)) + facet_wrap(~format, scales="free") + theme(axis.text.x = element_text(angle = 45,vjust=0.5), legend.position="none")

# Timings as percentages
ggplot(df2, aes(y=time, x=name, label=if_else(time>1,stringr::str_wrap(paste(message, " (", signif(time/total_time*100, digits=3), "%)",sep=""), 30), NULL),color=format)) + geom_bar(stat="identity", fill="transparent",size=0.1, color="black") + geom_text(size = 2.5,position = position_stack(vjust = 0.5)) + facet_wrap(~format, scales="free") + theme(axis.text.x = element_text(angle = 45,vjust=0.5), legend.position="none")

simoncozens commented 2 years ago

Here's the equivalent data for static instance generation: https://gist.github.com/simoncozens/9050865f138ae080bee599c9176b61db

Rplot04

A big chunk of instance generation is the UFO instantiation. I have a project to do this in Rust (triangulate) which would make that step incredibly quick; the rest of the font generation is embarrassingly parallel per instance, as they're completely independent font builds. This is something we could use Ninja to orchestrate (in gftools-builder-ninja we already do). I can easily see order of magnitude speedups for static instance generation.

simoncozens commented 2 years ago

Why do feature writers take a long time? One of the things they do is compile a GSUB table. They use this to trace substitutions and allocate properties to different glyphs. (i.e. we deduce that lam-ar.init is an Arabic glyph even though it is unencoded, because it is produced by a substitution from lam-ar which is encoded and has the Arabic script property.)

Once they've done that, they throw away the binary GSUB table, and it gets built again later...

anthrotype commented 2 years ago

that's right.. although that temporary GSUB is not serialized (so no overflow resolution triggers), only "built" in the sense that the features.fea is parsed and a GSUB table object is generated by the feaLib.builder. Also bear in mind there could be GSUB feature writers (there aren't none in the built-in writers yet but a user can define their own writers and plug them in the build, like they do with filters), which are run before all the GPOS-based feature writers I believe. We could in theory reuse that GSUB table that we build (to do the closure to classify glyphs by unicode properties for the subsequent GPOS writers) and keep as is in the final font instead of having feaLib redo the work. Worth trying. I probably thought of that and gave up for some reasons that I forgot now.

simoncozens commented 2 years ago

Looking at the variable font builds, can we drop the "save UFO sources" step? Currently the conversion from Glyphs to UFO gets a designspace/UFO object, writes out all the files and passes a path to run_from_designspace which then loads all the UFOs from disk again. Obviously in an incremental setup, having the UFOs on disk means that you can avoid the conversion next time, but is there any reason that run_from_designspace can't just optionally take a designSpaceLib object and we skip the save/load and keep everything in memory?

anthrotype commented 2 years ago

can we drop the "save UFO sources" step?

MutatorMath had its own parser that wanted to load from disk again, but we can now finally ditch that since fontmake has its own instantiator that works with in-memory designspace/UFO masters.

I think the other issue I'm working on right now -- broken include statements in features.fea when exporting .glyphs => .ufo to different directory than input file -- is somehow related to the current saving of UFO masters to disk; glyphsLib returns an in-memory designspace object populated with ufoLib2.Font objects that have no .path attribute becaues they haven't been loaded from disk but generated by code, if their features.fea contain some include statements, these must be resolved relative to the UFO's path (which doesn't exist yet until you save the ufo to disk), and lacking that feaLib can only use the current working directory to resolve includes which rarely makes sense... so fontmake saves them to disk and things (kinda) work. I'll experiment with not saving the UFO masters to disk (but I'm sure while I'm writing this Simon has already done the work LOL)

simoncozens commented 2 years ago

No, for once I thought I’d ask first if anyone’s tried it already… also I have run out of brain for today. Will try again tomorrow.

anthrotype commented 2 years ago

Get some well deserved rest, you did amazing work!

rsheeter commented 2 years ago

Great work! QQ, is the high level reporting stuff integrated into, or on a path to be integrated into, fontmake?

rsheeter commented 2 years ago

Noob question, what's the difference between "build OpenType features" and "run feature writes" ? - sounds mildly like the same thing.

behdad commented 2 years ago

The latter writes .fea files from glyph anchors and other data. The former compiled .fea files I think.

simoncozens commented 2 years ago

As Behdad says. One makes binary tables, the other decides what should go in them.

We have some (duelling) PRs for integrating the timing reporting into fontmake/ufo2ft and will try to sort them out today.

simoncozens commented 2 years ago

I suppose another thing to draw from this is that the "write/compile features once instead of per master" PR will make quite an impact.

But I can't shake the feeling that I had a year ago: we can tinker at various optimisations and maybe get 20%, 30%, but font compilation intrinsically means doing a lot of work, and there's no order-of-magnitude gain going to come from tweaking.

rsheeter commented 2 years ago

The discussion - and making of tweaks along the way - is helpful to bring some of us (...maybe just me?) up to speed on the wonders of fontmake. I'm very confident that in time we'll have our 1-2 order faster compiler and we can look back on how hilariously slow it used to be over a beer.

Zooming out for a moment, if we ignore our current implementation, which parts of font compilation truly are the majority of the work?

IIUC our prior is that the long poles should be 1) processing glyphs (embarrasingly parallel) and 2) processing layout (...I'm less clear on how parallel this can be). There is other work but none of it is tremendously expensive. Is that fair? Missing things that are non-trivial amounts of work?

simoncozens commented 2 years ago

Well, fonticulus is 60x faster than fontmake. I'll do some profiling and see what its hotspots are; would be interesting to compare.

googlefonts / oxidize

Gain a better sense of the sub-operations involved in font compilation and their timing #34