Closed bjrn closed 3 months ago
Good idea. Should be fairly straightforward.
I also wanted to look into whether it'd be possible to only include ligatures that might actually be exercised by the text
provided. Eg. there's no reason to preserve an "ff" ligature if the text is "foof". Maybe it'd make sense to tackle those two ideas together.
do you know if there's a standard-ish way of doing the opposite? I'm suspecting there might be a few weird exceptions to take into account?
Hmm, yeah, the U+4??
syntax looks like fun: https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face/unicode-range
In terms of subsetting I guess it's fine to just expand that to all the possible values, whether or not those codepoints actually exists in the font (or in the Unicode repertoire π ). The subsetting code should just ignore the codepoints that don't exist in the original font.
This module looks like it's up to the task: https://github.com/Japont/unicode-range
Good find! Yes, I made a super-naΓ―ve test-script, before I stumbled upon that U+4??
syntax π¬, will take a look at that one! II completely understand if you want to keep this library small and focused, and that specifying a unicode-range
might be an edge case which is better solved with providing an example in the readme where the conversion takes place prior to calling subset-font
. I'll play around a bit with it and get back.
no reason to preserve an "ff" ligature if the text is "foof"
true β¦ but isn't the text converted to a Set (of sorts, I'm not familiar with harfbuzz) and sorted?
I'll play around a bit with it and get back.
Great! Good luck! π
no reason to preserve an "ff" ligature if the text is "foof"
true β¦ but isn't the text converted to a Set (of sorts, I'm not familiar with harfbuzz) and sorted?
Yes, I think we'll have to go even more low level when instructing harfbuzz about which glyphs to include -- if that's even supported π¬
const path = require('path');
const { readFile, writeFile } = require('fs').promises;
const subsetFont = require('subset-font');
const { UnicodeRange } = require('@japont/unicode-range');
const latinRange = 'U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD';
// util to handle passing unicode-range as a string
function formatRange(range) {
if (typeof range === 'string') {
return range.replace(/\s*/g, '').split(',');
}
return range;
}
function getGlyphsFromUnicodeRange(range) {
// UnicodeRange currently requires an array of ranges β¦
const rangeArray = formatRange(range);
const glyphs = UnicodeRange.parse(rangeArray).map((cp) =>
String.fromCodePoint(cp)
);
return glyphs;
}
async function generateFont() {
const font = await readFile(
path.resolve(__dirname, 'woff2', 'SomeFontFile.woff2')
);
const glyphs = getGlyphsFromUnicodeRange(latinRange);
const result = await subsetFont(font, glyphs, {
targetFormat: 'woff2',
});
// ... and so on
}
Did a quick try, and from what I can tell so far, that library does the trick ππΌ .
I don't know how you feel, but figuring out which glyphs to subset might seem a bit out of scope for subset-font
after all (in the same way as subfont
handles parsing of content etc.). Let me know if you want me to make a PR with an example, or anything regarding this.
Skipping unused ligatures is an interesting one, depending on language group there might be some savings. I have mostly thought about it as a on/off thing, (ie. liga
is either enabled or disabled for the font). In my current use-case, there's a mix of static and dynamic content, hence the need to subset fonts based on unicode-range rather than individual codepoints β¦ I would love to dive deeper into it though
Great that you got it to work! Thanks for sharing your solution. I agree with your scope concern. Let's leave it here for now and see if it comes up as a common request. Maybe we can even add a link to this issue to the README.
Skipping unused ligatures is an interesting one, depending on language group there might be some savings. I have mostly thought about it as a on/off thing, (ie.
liga
is either enabled or disabled for the font).
I'll probably explore it one day when I have time. I'm not sure that the savings will be big either, it's mostly from a perfectionist angle. Spending years hunting down these kilobyte savings does that to you π
In my current use-case, there's a mix of static and dynamic content, hence the need to subset fonts based on unicode-range rather than individual codepoints β¦ I would love to dive deeper into it though
Ah yes, that makes sense! Btw. subfont
has an experimental --dynamic
switch that renders the pages in a headless browser and does additional tracing inside it. But it might not work for you, depends on exactly how dynamic the content is :)
I'd also be happy to entertain the idea of configuring subfont
to include a given unicode-range
of characters in the subsets, regardless of what the tracing step says. It wouldn't really be hard to do, I think the main challenge would be to come up with a way to configure it if it has to be configurable per @font-family
declaration.
Yes it could be a good fit within subfont
s scope actually β much of the tooling around generating @font-face
declarations would be useful, just that instead of deriving unicode-range
from parsed content, it would be provided by the configuration.
Regarding the per @font-family
declaration, that is a tricky one, since I guess much of the idea behind subfont is to enable it as a drop-in addition to static site generators
Yeah, that is the core use case, but I'm not opposed to exposing more controls like that. We could even do it as a custom CSS property in the @font-face
rule, eg.:
@font-face {
font-family: foo;
src: ...;
font-weight: 700;
-subfont-unicode-range: U+0131, U+0152-0153, U+02BB-02BC;
}
For what it's worth, Munter/subfont#161 implemented the ability to specify text to include in the subset via -subfont-text
.
I'll close this for now.
subset-font allows for passing in a string with glyphs to subset, but would it be interesting to also include an option to pass a
unicode-range
like possible withpyftsubset
?I'm aware that
subfont
provides a conversion utility to convert a string to aunicode-range
(for CSS output) but do you know if there's a standard-ish way of doing the opposite? I'm suspecting there might be a few weird exceptions to take into account?