Closed hmeine closed 1 year ago
It's an official term: https://fonts.google.com/knowledge/glossary/subsetting
so it is still not clear to me why that would be a feature of a font shaping library
It's usually not, but harfbuzz does provide it. And you cannot implement subsetting without shaping, so they are usually pretty tightly connected.
Not sure about "cannot implement subsetting without shaping" (see here). Maybe more that subsetting the glyphs doesn't make much sense without having shaped to glyphs beforehand (because you wouldn't know what to keep).
@laurmaedje Well, how do you know which glyphs are not needed in the first place? 😄
As for the subsetter
crate, it seems like it's very simplistic for now, compared to HarfBuzz and Allsorts, where they can subset way more tables (most of them in case of HarfBuzz). Which requires a complete TrueType parser and writer. And this is an enormous amount of work and code. In HarfBuzz, subsetting alone is like 30 KLOC.
Yeah, sure, subsetting by glyphs only really makes sense in combination with shaping. Just wanted to clarify that the process itself is independent from shaping. It's also true that subsetter
is quite simple. From what I've seen allsorts doesn't do that much more though. Harfbuzz of course is a totally different story as it allows you to subset layout tables. But that's not really necessary if you have already done shaping as you'd drop them anyway. So then you would actually subset by character set instead of glyph set and do it independently from shaping. :)
First of all, thanks for the quick replies. The google fonts link you gave was also the first one I found (well, using Google…) and found helpful – yet, it sounds as if all uses cases resulted in a font (which I read as ".ttf file on disk, for example"). However, the README also states "No Arabic fallback shaper, since it requires subsetting." which sounds as if the same helper functions would also be used to combine multiple fonts. The latter indeed seems to be more important in the context of a text engine, whereas I would not see the point of including code for producing and saving stripped down versions of fonts when the goal is to render unicode text.
But that's not really necessary if you have already done shaping as you'd drop them anyway.
This is a very PDF-specific case 😉 For the web, a subsetter must preserve layout tables as well.
"No Arabic fallback shaper, since it requires subsetting." which sounds as if the same helper functions would also be used to combine multiple fonts
If I remember correctly, HarfBuzz generates "default" shaping tables for some malformed Arabic fonts and then passes them to the shaper. The README wording is a bit vague, but the point is that HarfBuzz generates a TrueType table using its own subsetting implementation. But we don't need the whole subsetter here.
combine multiple fonts
Subsetting, generally, doesn't combine multiple fonts. It strips the provided one.
Honestly, the real problem here is that there are almost no information about modern text engines. If you know - you know. This library, and its readme, is for people who already know what they need it for. You should not use it directly, but use a library build on top of it. And there are no such libraries for now. There are attempts, but no more.
Hi, I just watched the whole talk by Chris Chapman on the unicode text engine stack, but the term "subsetting" that appears in the rustybuzz README many times was still new to me. It sounded as if it was about settings, but searching the web made me believe that it's from "subsets" of fonts. Yet, the explanations I found were about fonts that contain subsets of glyphs, so it is still not clear to me why that would be a feature of a font shaping library.
If you give me an explanation, or point me to one, I can suggest a tiny(!) change to the README in a PR.