Unique font id - Githubissues

raphlinus commented 5 years ago

The font loader should have an "Id" type that efficiently impl's PartialEq and Hash, that is guaranteed (with reasonably high confidence) to be equal when the fonts are equal. Such an Id is necessary for caching of any resource associated with a font (rendered glyphs, HarfBuzz font and face objects, table data, coverage bitmaps). Currently font-kit tends to use PostScript name for this purpose, but it's easy to see how this might fail uniqueness (for example, when a web font stack contains a different version of a font already present in the system). It should also be cheap to compute, because we will need it for every typeface that comes back from a fallback query.

Skia's SkTypeface has a uniqueID method. Its implementation largely rests on using a cache - when going from a native font object (for example, a DirectWrite font) to an SkTypeface, there's a comparison procedure to compare equality with existing fonts in the cache. An example for DirectWrite shows a fairly complex set of heuristics - it starts with pointer equality and other checks based on the way the font is loaded, then compares a bunch of name strings.

I'm not at all convinced font-kit should be based on a cache the way Skia does it, but it might be useful as a reference.

In any case, we should export the type, then implementations can refine the tactics to compare equality efficiently and reliably, without breaking client code - for example, an initial implementation can be based on PostScript name. This is one reason I agree with @pcwalton it should be a type and not a small integer. Another reason is so the Debug formatting is informative.

pcwalton commented 5 years ago

@gw3583 What key does WebRender use for the glyph cache?

gw3583 commented 5 years ago

@pcwalton WR mostly leaves the management of font IDs to Gecko. Gecko creates font resources via the WR API and they have an arbitrary ID:

https://github.com/servo/webrender/blob/e2dd5aa76af2775f1da110f70c6a05159bd725d5/webrender_api/src/font.rs#L78

Font instance structures then reference that resource, with per-instance information:

https://github.com/servo/webrender/blob/e2dd5aa76af2775f1da110f70c6a05159bd725d5/webrender/src/glyph_rasterizer/mod.rs#L174

https://github.com/servo/webrender/blob/e2dd5aa76af2775f1da110f70c6a05159bd725d5/webrender/src/glyph_rasterizer/mod.rs#L208

Glyph caches are per-font-instance, with keys inside each of those caches being a glyph index.

raphlinus commented 5 years ago

Doing a little more research, I see that as of Windows 10, you can test whether two fonts are equal. This is not the same as a globally unique id, but I can see using this method to manage a cache. In particular there can be a cache of fallback fonts and we can use PostScript name as hash basically, then equality to verify. That doesn't really solve the problem of using an id for caching of downstream resources though (glyphs, table data, etc).

jfkthame commented 5 years ago

An example for DirectWrite shows a fairly complex set of heuristics - it starts with pointer equality and other checks based on the way the font is loaded, then compares a bunch of name strings.

A fairly simple enhancement to this, I think, would be to get the head table of each font and compare those for equality; given the presence of the checksum and timestamp fields in this table, which are normally set automatically by font-creation tools, it's highly unlikely that the head tables of different fonts/versions will be equal.

pcwalton commented 5 years ago

I propose replacing the usages of the PostScript name in font-kit with PostScriptName-HASH where HASH is a truncated hash of the head table. For example, Lucida Grande might be something like Lucida-Grande-6789abcd (arbitrarily chosen hash). This will allow the unique identifiers to be human-readable while remaining globally unique.

pcwalton commented 4 years ago

I've started a branch: https://github.com/pcwalton/font-kit/tree/unique-id

The scheme I'm going with is NAME/REVISION/HASH, where:

NAME is the PostScript name if available and the full name if not. (A PostScript name won't be available for legacy PCF fonts, for example.)
REVISION is the revision number in major.minor form, where major is the high 16 bits of the revision field and minor is the low 16 bits. The minor version and the dot are omitted if zero. (Even though this is supposed to be a fixed-point number, in practice fonts are inconsistent with using it this way, so I don't print it out like this.)
HASH is the CRC32-C hash of the head table.

There are also some flags that specify whether NAME is actually the PostScript name and whether the revision and hash are actually hashes of the head table. These flags are present to make lookups of fonts by font-kit ID via the system APIs as efficient as possible.

Examples: Wingdings-Regular/5/bcae651e, HelveticaNeue-CondensedBold/1/4ba8c656, ComicSansMS/5/ff0352dd.

servo / font-kit

Unique font id #40