servo / font-kit

A cross-platform font loading library written in Rust
Apache License 2.0
660 stars 98 forks source link

Unique font id #40

Open raphlinus opened 5 years ago

raphlinus commented 5 years ago

The font loader should have an "Id" type that efficiently impl's PartialEq and Hash, that is guaranteed (with reasonably high confidence) to be equal when the fonts are equal. Such an Id is necessary for caching of any resource associated with a font (rendered glyphs, HarfBuzz font and face objects, table data, coverage bitmaps). Currently font-kit tends to use PostScript name for this purpose, but it's easy to see how this might fail uniqueness (for example, when a web font stack contains a different version of a font already present in the system). It should also be cheap to compute, because we will need it for every typeface that comes back from a fallback query.

Skia's SkTypeface has a uniqueID method. Its implementation largely rests on using a cache - when going from a native font object (for example, a DirectWrite font) to an SkTypeface, there's a comparison procedure to compare equality with existing fonts in the cache. An example for DirectWrite shows a fairly complex set of heuristics - it starts with pointer equality and other checks based on the way the font is loaded, then compares a bunch of name strings.

I'm not at all convinced font-kit should be based on a cache the way Skia does it, but it might be useful as a reference.

In any case, we should export the type, then implementations can refine the tactics to compare equality efficiently and reliably, without breaking client code - for example, an initial implementation can be based on PostScript name. This is one reason I agree with @pcwalton it should be a type and not a small integer. Another reason is so the Debug formatting is informative.

pcwalton commented 5 years ago

@gw3583 What key does WebRender use for the glyph cache?

gw3583 commented 5 years ago

@pcwalton WR mostly leaves the management of font IDs to Gecko. Gecko creates font resources via the WR API and they have an arbitrary ID:

https://github.com/servo/webrender/blob/e2dd5aa76af2775f1da110f70c6a05159bd725d5/webrender_api/src/font.rs#L78

Font instance structures then reference that resource, with per-instance information:

https://github.com/servo/webrender/blob/e2dd5aa76af2775f1da110f70c6a05159bd725d5/webrender/src/glyph_rasterizer/mod.rs#L174

https://github.com/servo/webrender/blob/e2dd5aa76af2775f1da110f70c6a05159bd725d5/webrender/src/glyph_rasterizer/mod.rs#L208

Glyph caches are per-font-instance, with keys inside each of those caches being a glyph index.

raphlinus commented 5 years ago

Doing a little more research, I see that as of Windows 10, you can test whether two fonts are equal. This is not the same as a globally unique id, but I can see using this method to manage a cache. In particular there can be a cache of fallback fonts and we can use PostScript name as hash basically, then equality to verify. That doesn't really solve the problem of using an id for caching of downstream resources though (glyphs, table data, etc).

jfkthame commented 5 years ago

An example for DirectWrite shows a fairly complex set of heuristics - it starts with pointer equality and other checks based on the way the font is loaded, then compares a bunch of name strings.

A fairly simple enhancement to this, I think, would be to get the head table of each font and compare those for equality; given the presence of the checksum and timestamp fields in this table, which are normally set automatically by font-creation tools, it's highly unlikely that the head tables of different fonts/versions will be equal.

pcwalton commented 4 years ago

I propose replacing the usages of the PostScript name in font-kit with PostScriptName-HASH where HASH is a truncated hash of the head table. For example, Lucida Grande might be something like Lucida-Grande-6789abcd (arbitrarily chosen hash). This will allow the unique identifiers to be human-readable while remaining globally unique.

pcwalton commented 4 years ago

I've started a branch: https://github.com/pcwalton/font-kit/tree/unique-id

The scheme I'm going with is NAME/REVISION/HASH, where:

There are also some flags that specify whether NAME is actually the PostScript name and whether the revision and hash are actually hashes of the head table. These flags are present to make lookups of fonts by font-kit ID via the system APIs as efficient as possible.

Examples: Wingdings-Regular/5/bcae651e, HelveticaNeue-CondensedBold/1/4ba8c656, ComicSansMS/5/ff0352dd.