harfbuzz / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
551 stars 37 forks source link

README does not explain "subsetting" #58

Closed hmeine closed 1 year ago

hmeine commented 2 years ago

Hi, I just watched the whole talk by Chris Chapman on the unicode text engine stack, but the term "subsetting" that appears in the rustybuzz README many times was still new to me. It sounded as if it was about settings, but searching the web made me believe that it's from "subsets" of fonts. Yet, the explanations I found were about fonts that contain subsets of glyphs, so it is still not clear to me why that would be a feature of a font shaping library.

If you give me an explanation, or point me to one, I can suggest a tiny(!) change to the README in a PR.

RazrFalcon commented 2 years ago

It's an official term: https://fonts.google.com/knowledge/glossary/subsetting

so it is still not clear to me why that would be a feature of a font shaping library

It's usually not, but harfbuzz does provide it. And you cannot implement subsetting without shaping, so they are usually pretty tightly connected.

laurmaedje commented 2 years ago

Not sure about "cannot implement subsetting without shaping" (see here). Maybe more that subsetting the glyphs doesn't make much sense without having shaped to glyphs beforehand (because you wouldn't know what to keep).

RazrFalcon commented 2 years ago

@laurmaedje Well, how do you know which glyphs are not needed in the first place? 😄

As for the subsetter crate, it seems like it's very simplistic for now, compared to HarfBuzz and Allsorts, where they can subset way more tables (most of them in case of HarfBuzz). Which requires a complete TrueType parser and writer. And this is an enormous amount of work and code. In HarfBuzz, subsetting alone is like 30 KLOC.

laurmaedje commented 2 years ago

Yeah, sure, subsetting by glyphs only really makes sense in combination with shaping. Just wanted to clarify that the process itself is independent from shaping. It's also true that subsetter is quite simple. From what I've seen allsorts doesn't do that much more though. Harfbuzz of course is a totally different story as it allows you to subset layout tables. But that's not really necessary if you have already done shaping as you'd drop them anyway. So then you would actually subset by character set instead of glyph set and do it independently from shaping. :)

hmeine commented 2 years ago

First of all, thanks for the quick replies. The google fonts link you gave was also the first one I found (well, using Google…) and found helpful – yet, it sounds as if all uses cases resulted in a font (which I read as ".ttf file on disk, for example"). However, the README also states "No Arabic fallback shaper, since it requires subsetting." which sounds as if the same helper functions would also be used to combine multiple fonts. The latter indeed seems to be more important in the context of a text engine, whereas I would not see the point of including code for producing and saving stripped down versions of fonts when the goal is to render unicode text.

RazrFalcon commented 2 years ago

But that's not really necessary if you have already done shaping as you'd drop them anyway.

This is a very PDF-specific case 😉 For the web, a subsetter must preserve layout tables as well.

RazrFalcon commented 2 years ago

"No Arabic fallback shaper, since it requires subsetting." which sounds as if the same helper functions would also be used to combine multiple fonts

If I remember correctly, HarfBuzz generates "default" shaping tables for some malformed Arabic fonts and then passes them to the shaper. The README wording is a bit vague, but the point is that HarfBuzz generates a TrueType table using its own subsetting implementation. But we don't need the whole subsetter here.

combine multiple fonts

Subsetting, generally, doesn't combine multiple fonts. It strips the provided one.

Honestly, the real problem here is that there are almost no information about modern text engines. If you know - you know. This library, and its readme, is for people who already know what they need it for. You should not use it directly, but use a library build on top of it. And there are no such libraries for now. There are attempts, but no more.