Considerations for "desiccated" initial font files

skef commented 2 months ago

Apologies if this is handled somewhere in the spec. I don't remember it being discussed explicitly but maybe it's all implicit.

I know we've discussed the possibility of effectively removing a round-trip by loading the initial font file in CSS, so that it functions analogously to the unicode mapping for unicode-range. I believe this could be done with a Base64-encoded data URL, and maybe it could be done in other ways.

Suppose (because you're paying a penalty for the Base64) you want to minimize the size of the initial file, downloading most of the "universal" data via an initial patch. So let's assume we're doing some combination of per-table patches and glyph-keyed patches (one or the other or both).

What is the minimum initial table set assuming per-table patches? My previous understanding is that it would be cmap, IFT, and IFTX (when present) but maybe things have expanded since then? Are there certain tables that should be required in order to solve chicken-and-egg problems on the client side -- perhaps "head" or "post"?
For glyph-keyed patches the normal presumption is that the glyf/loca/GVAR or CFF/CFF2 tables will be present. Maybe this is fine, they should compress pretty well when "empty". If we want that not to be the case, though, and to download those tables in the initial per-table patch, can we have some semantic for doing that without an extra round-trip? (That is, to start downloading the initial per-table patch and the glyph-keyed patches at the same time, but guarantee that the per-table patch is applied first.)
Should we add language to the effect of "an initial font not yet checked for some set of code points cannot be judged for completeness or OpenType/OFF spec compliance."?

garretrieger commented 2 months ago

What is the minimum initial table set assuming per-table patches? My previous understanding is that it would be cmap, IFT, and IFTX (when present) but maybe things have expanded since then? Are there certain tables that should be required in order to solve chicken-and-egg problems on the client side -- perhaps "head" or "post"?

Format 1 patch maps have a dependency on cmap and maxp. Format 2 patch maps do not require any other tables. As you stated glyph keyed patches will additionally depend on a loca + glyf/CFF/CFF2 table being present. I don't believe head or post would be needed so long as the IFT processing happens before the client attempts to use the font in anyway.

For glyph-keyed patches the normal presumption is that the glyf/loca/GVAR or CFF/CFF2 tables will be present. Maybe this is fine, they should compress pretty well when "empty". If we want that not to be the case, though, and to download those tables in the initial per-table patch, can we have some semantic for doing that without an extra round-trip? (That is, to start downloading the initial per-table patch and the glyph-keyed patches at the same time, but guarantee that the per-table patch is applied first.)

So I think you could set up a very minimal initial font which utilizes glyph keyed patches like this:

Initial font starts with only a format 2 IFT and IFTX table (since format 2 doesn't depend on other tables).
The IFTX table contains a single partially invalidating per table brotli patch which matches all codepoints, all features, and all design space. This patch would bootstrap the font by adding all of the tables required to be present (eg. glyf, head, loca, CFF, CFF2, maxp, etc.) and any initial data that's desired.
The IFT table would contain the listing of all of the glyph keyed patches like normal.

A client processing the file would be required to apply the IFTX bootstrapping patch first since partially invalidating patches have priority over non-invalidating, but since it's partially invalidating an optimized client could download the required glyph keyed patches in parallel.

I'd like to test this out and see if it works in practice. Assuming this all works I think it could be good to talk about how to do this in the encoder guidance.

Should we add language to the effect of "an initial font not yet checked for some set of code points cannot be judged for completeness or OpenType/OFF spec compliance."?

Yes I think something like this could be good, but it would need a bit more details. There will be cases where an initial font is valid to be used and contains some initial data which can be used immediately while waiting for the first patch to arrive. So we probably want to phrase it around something like if the initial font is missing required tables (eg. cmap) it shouldn't be used until patches are applied, but if those tables are present it's OK to use before the first patches arrive. Actually, maybe it might be worth adding a flag bit into the patch map that indicates if the initial font is usable or not?

skef commented 2 months ago

Ah, right, maxp!

I don't believe head or post would be needed so long as the IFT processing happens before the client attempts to use the font in anyway.

That does seem likely. I was wondering about post because sometimes the PostScript name of the font is relevant to "discovery" (broadly speaking), like with a TTC. But maybe there's nothing like that in a web context absent a TTC.

I can imagine non-web contexts in which you might have a set of initial fonts in some directory and might want to know something about them that can't be gleaned from the filename. But in that case you're probably not trying to absolutely minimize the initial size, and anyway an encoder can include whatever tables it wants to for an unusual use case.

garretrieger commented 1 week ago

I wrote up some thoughts on how to detect codepoint readiness in a separate issue since this that is a more general problem that also comes up outside of the initial font load: https://github.com/w3c/IFT/issues/223

w3c / IFT

Considerations for "desiccated" initial font files #201