Open raphlinus opened 5 years ago
@gw3583 What key does WebRender use for the glyph cache?
@pcwalton WR mostly leaves the management of font IDs to Gecko. Gecko creates font resources via the WR API and they have an arbitrary ID:
Font instance structures then reference that resource, with per-instance information:
Glyph caches are per-font-instance, with keys inside each of those caches being a glyph index.
Doing a little more research, I see that as of Windows 10, you can test whether two fonts are equal. This is not the same as a globally unique id, but I can see using this method to manage a cache. In particular there can be a cache of fallback fonts and we can use PostScript name as hash basically, then equality to verify. That doesn't really solve the problem of using an id for caching of downstream resources though (glyphs, table data, etc).
An example for DirectWrite shows a fairly complex set of heuristics - it starts with pointer equality and other checks based on the way the font is loaded, then compares a bunch of name strings.
A fairly simple enhancement to this, I think, would be to get the head
table of each font and compare those for equality; given the presence of the checksum and timestamp fields in this table, which are normally set automatically by font-creation tools, it's highly unlikely that the head
tables of different fonts/versions will be equal.
I propose replacing the usages of the PostScript name in font-kit
with PostScriptName-HASH
where HASH is a truncated hash of the head
table. For example, Lucida Grande might be something like Lucida-Grande-6789abcd
(arbitrarily chosen hash). This will allow the unique identifiers to be human-readable while remaining globally unique.
I've started a branch: https://github.com/pcwalton/font-kit/tree/unique-id
The scheme I'm going with is NAME/REVISION/HASH
, where:
NAME
is the PostScript name if available and the full name if not. (A PostScript name won't be available for legacy PCF fonts, for example.)
REVISION
is the revision number in major.minor
form, where major
is the high 16 bits of the revision field and minor
is the low 16 bits. The minor version and the dot are omitted if zero. (Even though this is supposed to be a fixed-point number, in practice fonts are inconsistent with using it this way, so I don't print it out like this.)
HASH
is the CRC32-C hash of the head
table.
There are also some flags that specify whether NAME
is actually the PostScript name and whether the revision and hash are actually hashes of the head
table. These flags are present to make lookups of fonts by font-kit ID via the system APIs as efficient as possible.
Examples: Wingdings-Regular/5/bcae651e
, HelveticaNeue-CondensedBold/1/4ba8c656
, ComicSansMS/5/ff0352dd
.
The font loader should have an "Id" type that efficiently impl's PartialEq and Hash, that is guaranteed (with reasonably high confidence) to be equal when the fonts are equal. Such an Id is necessary for caching of any resource associated with a font (rendered glyphs, HarfBuzz font and face objects, table data, coverage bitmaps). Currently font-kit tends to use PostScript name for this purpose, but it's easy to see how this might fail uniqueness (for example, when a web font stack contains a different version of a font already present in the system). It should also be cheap to compute, because we will need it for every typeface that comes back from a fallback query.
Skia's SkTypeface has a
uniqueID
method. Its implementation largely rests on using a cache - when going from a native font object (for example, a DirectWrite font) to an SkTypeface, there's a comparison procedure to compare equality with existing fonts in the cache. An example for DirectWrite shows a fairly complex set of heuristics - it starts with pointer equality and other checks based on the way the font is loaded, then compares a bunch of name strings.I'm not at all convinced font-kit should be based on a cache the way Skia does it, but it might be useful as a reference.
In any case, we should export the type, then implementations can refine the tactics to compare equality efficiently and reliably, without breaking client code - for example, an initial implementation can be based on PostScript name. This is one reason I agree with @pcwalton it should be a type and not a small integer. Another reason is so the Debug formatting is informative.