googlefonts / roboto-classic

Development of a Roboto Variable font
SIL Open Font License 1.1
154 stars 15 forks source link

File size diet? :) #46

Open davelab6 opened 4 years ago

davelab6 commented 4 years ago

I wonder if there are any opportunities for file size reduction by better use of components, ccmp diacritic glyph construction, etc?

mikedug commented 4 years ago

hi Dave is this question related to Roboto only? Most glyphs that can be made as composites are made as so in Roboto.

dberlow commented 4 years ago

That is so. All of the floating diacritics are composites. There are no flipped ones I can see From looking at the composites in my inspector.

Glyphs like Ø , H-Bar, D and L w/bar, [Probably 20 glyphs] could be redone from whole additional contours to the source glyph’s contour with the addition of new glyphs for bars and the compositing code. I.e. a separate bar composites to an O, smaller than a new glyph with and O and a bar through it.

Then, the way I talked DC into this for Extremo, is that the file savings go up with each master, because the variations are reusing the O’s and bar’s gvars, and the composite data, more efficiently than a whole ‘nother set of gvars for an Ø contour.

After doing this for all the masters in the design space, a new variable could be given to Mike with a list of glyphs whose hints would be removed. These glyphs would then have the hinting of composites made and tested. Mike will correct me if needs be.

It could then go to Marc for retesting to make sure nothing went wrong with the old stuff, and that the new stuff is acceptable.

I’d suggest we’d need two-three weeks to complete this with what else is on the stove, but’s be glad to do so, if DC will give us schedule release from the original deadline.

davelab6 commented 4 years ago

It would be good to ship the fonts and then ship an update with a smaller file size in the following quarter.

dberlow commented 4 years ago

A survey of the target glyphs (red).

And a couple of alts I think we're adding to Extremo (green).

Compositables of Classic
dberlow commented 4 years ago

The plan then is for Mike to finish and deliver a version of Classic.vf by the 6th of March. By Mid-march, we'll have a new source with additional composites and then, as schedule permits, or is required, Mike will produce a new hinted version that same as delivered but with these (27) red glyphs composited.

brawer commented 4 years ago

Doesn’t HarfBuzz fall back to Unicode Normalization Form D (NFD) when an input code point is missing from the font’s cmap table? For example, when a font doesn’t contain Ä as U+00C4 (Latin Capital Letter A with Diaeresis), I believe HarfBuzz will re-try shaping U+0041 U+0308 (Latin Capital Letter A + Combining Diaeresis). So, if my memory of HarfBuzz is true, and if the font is correctly handling combining accents (does Roboto do that?), you could try removing all precomposed characters. There’s about 13K characters in Unicode whose NFD is different, so it might save substantial space. However, the font would only work with systems based on HarfBuzz (or a shaping engine with the same fallback logic). If you try this out, can you tell what your findings were? Here’s a quick Python3 snippet for printing pre-composed codepoints:

from unicodedata import normalize
for codepoint in range(0, 0x110000):
    c = chr(codepoint)
    if c != normalize("NFD", c):
        print("U+%06X" % codepoint)
dberlow commented 4 years ago

Doesn’t apply, Far as I can see

davelab6 commented 4 years ago

I think there is definitely something fishy going on here :)

In a cosmic coincidence, earlier this month I'd (a) learned that fontmake or at least vttLib doesn't build fonts that use flipped composites, and (b) requested a "Global Latin" glyph set definition from @moyogo (gftools/pull/177) to better define support for African languages that use an Extended Latin set.

For example, there are 6 characters missing from the Google Fonts "Core" Latin set latin_unique-glyphs.nam that are needed to support Yoruba:

0x0300    COMBINING GRAVE ACCENT
0x0301    COMBINING ACUTE ACCENT
0x0303    COMBINING TILDE
0x0304    COMBINING MACRON
0x0309    COMBINING HOOK ABOVE
0x0323    COMBINING DOT BELOW

@rsheeter did some research into this, and indeed, after adding those 6 characters to both Open Sans and Roboto's core subsets, which support 5 glyphs (no combining macron) for the 6 chars, then Roboto adds ~100 glyphs whereas Open Sans adds only 4!

Family gids_latin => gids_latin_plus_six_codepoints
Roboto 248 352
Open Sans 230 234

Rod is wondering if Roboto has been built in a way that adds composed glyphs for things we nowadays expect to be done with the unicode composition that @brawer described.

I am thinking that indeed this is the case, and that while a full ttfdiet may be too aggressive, a light diet may indeed yield significant file size savings.

rsheeter commented 4 years ago

To give a specific example, if I subset to a, b, combining grave, combining accent I expected to see the combinations with grave/accent done by layout but Roboto does some via actual glyphs.

for family in Roboto OpenSans; do 
  pyftsubset ${family}-Regular.ttf \
  --unicodes="U+0061-0062,U+0x0300-0301" \
  --output-file=${family}-Regular_combtest.ttf; 
  ttx -o ${family}-Regular_combtest.ttx ${family}-Regular_combtest.ttf;
done

    hb-shape --ned -u 'U+0061,U+0300' Roboto-Regular_combtest.ttf
    [gid7]

    hb-shape --ned -u 'U+0061,U+0301' Roboto-Regular_combtest.ttf
    [gid8]

    hb-shape --ned -u 'U+0062,U+0300' Roboto-Regular_combtest.ttf
    [gid3|gid5@1149,0]

    hb-shape --ned -u 'U+0062,U+0301' Roboto-Regular_combtest.ttf
    [gid3|gid6@1149,0]

    hb-shape --ned -u 'U+0061,U+0300' OpenSans-Regular_combtest.ttf
    [gid2|gid5@1139,0]

    hb-shape --ned -u 'U+0061,U+0301' OpenSans-Regular_combtest.ttf
    [gid2|gid6@1139,0]

    hb-shape --ned -u 'U+0062,U+0300' OpenSans-Regular_combtest.ttf
    [gid3|gid5@1255,0]

    hb-shape --ned -u 'U+0062,U+0301' OpenSans-Regular_combtest.ttf
    [gid3|gid6@1255,0]
dberlow commented 4 years ago

I see now. This should work with Roboto, and its single set of combining accents, without a problem. As long as the optical centers of glyphs and diacritics are aligned and no glyph-specific horizontal or vertical refinements are required. I think that’s the way it’s always been, but perhaps type designers have not trusted the quality compared to refined glyph construction.

So do we have refining capabilities for every single individual glyph?

Moving on to more Complex requirements, e.g. in Roboto Extremo, there are five sets of combining accents to compensate for different design requirements of the cases, the above and below accents and those involved in stacking, all of which, once projected out into a larger design space, put too much stress on one accent working for all combinations.

I like to know as much as possible about saving space as I can but on the other hand my goal is to produce the same quality Across the entire design space as exists in the default.

rsheeter commented 4 years ago

To give a little more context, if I compare subsetting to latin with/without the six added codepoints for Yoruba (aside: actually only 4 are for Yoruba) the size delta for Roboto is way out of line with the rest of our library.

+6 codepoints in latin for Open Sans adds 4 gids (230 => 234) and increases woff2 filesize by 100-200 bytes (1-2%). +6 codepoints in latin for Roboto adds 104 gids (248 => 352) and increases woff2 filesize by 2400-2800 bytes (17-22%).

The woff2 filesize is the real concern. I'm hoping it's possible to adjust Roboto to perform closer to Open Sans when we add combining characters to the subset.

brawer commented 4 years ago

I think there is definitely something fishy going on here

Does Roboto perhaps implement combining characters with GSUB, as in sub a acutecomb by aacute? Then, ttx will not remove the aacute glyph from the font (which is correct, since the composed glyph might look different); that would explain the gid7 and gid8 in the output above. Instead, try writing your substitution rules the other way, like Open Sans seems to do: sub aacute by a acutecomb. If the a and acutecomb glyphs have the right anchors, the mark placement feature should be generated automatically; and subsetting can then remove the aacute from the font. (You could also try deleting all accented glyphs from the font source, and someone writes a tool that synthesizes precomposed characters for platforms that really need them; shouldn’t be very hard).

dberlow commented 4 years ago

As far as I know, Roboto as delivered can go either way. We use an open source tool called Glyph Builder, by Ben Kiel I think, that synthesizes precomposed characters for the purpose of serving everyone. You may serve it to any system and it’ll work. Or you may subset it, removing all the composed accents, and serve it to have HarfBuzz’s fall back generate missing glyphs.

Santiago and I will meet this a.m., and update you all later today on providing the lightest Roboto by both methods discussed here, removal of composed glyphs, and addition of more composites that save glyph and gvar space.

Thanks.

davelab6 commented 4 years ago

I see https://benkiel.com/typeDesign/ has buildAccents.py for FontLab 5 as a source-based accent builder, and https://robofont.com/documentation/how-tos/building-accented-glyphs/ suggests this is nowadays replaced with https://github.com/typemytype/glyphconstruction for folks like you using RoboFont.

sannorozco commented 4 years ago

Yes, the latter is what we use!

dberlow commented 4 years ago

We are going through the steps of undoing the error we inherited on this.

khaledhosny commented 4 years ago

Doesn’t HarfBuzz fall back to Unicode Normalization Form D (NFD) when an input code point is missing from the font’s cmap table?

Yes, but many applications (most actually) will check the ccmp table to decide whether or not to use a fallback font, before shaping with HarfBuzz. Chrome (and to some extent LibreOffice) is the only exception as it shapes first then uses fallback font for unsupported characters.

brawer commented 4 years ago

Very curious about the actual space savings on a large font like Roboto, especially in WOFF format. If the improvement was huge, it might be worth building subsetted fonts for certain environments (eg. Google Fonts, which afaik can support browser-specific fonts); a large size difference might also be a reason for other browsers and apps to implement the same fallback logic like Chrome or LibreOffice. But if it’s only a small difference, it’s clearly not worth bothering.

sannorozco commented 4 years ago

Hallo,

I broke our findings in two issues #49 & #48

m4rc1e commented 4 years ago

I've removed all composite glyphs which have a unicode decomposition for Roboto Regular Hinted. The file is around ~8% smaller when we factor in all other pyftsubset optimisations.

For Chrome and modern browsers, the results are ok. There are some mark positioning issues but these can be solved.

Desktop_Windows_10_chrome_69 0_

Win 10 Chrome 70

Internet Explorer 11 is a mess.

Desktop_Windows_7_ie_11 0_

Win 7 ie11

If we ignore the shifts, we can see the accented glyphs are using a fallback font.

If we do go ahead with this diet idea, imo it has to be done by the gf backend as Sascha suggested and only served to particular browsers. I don't think @sannorozco and TN should be doing this manually.

The benefit of making this server side is we can apply it to other families as well.

m4rc1e commented 4 years ago

I've made a minimal test case for those who are interested.

OS X tests:

OSX_chrome_80

Chrome

OSX_safari_13

Safari


Win tests:

win10_chrome_80

Chrome

win10_edge_80

Edge 80

win10_firefox_74

Firefox

win10_ie_11

IE 11

Seems only Chrome and Edge are ok.

diet_testcase.zip

davelab6 commented 4 years ago

@rsheeter please can you provide a way for @m4rc1e to reproduce the filesize increase you found when adding the Yoruba characters that appears unique to Roboto statics?

The work being done in this thread is in response to that concern, but since this is taking much longer to complete than anticipated, I would like to decouple the two tasks:

davelab6 commented 4 years ago

On a 1/1 call just now, @m4rc1e also proposed that perhaps its better to do the diet'ing like https://github.com/twardoch/ttfdiet as a post-processing step, which is therefore applicable to any font project, and not in source files - since pyftsubset retains hinting

So, we probably need to checkout the current master into a holding 'diet' branch, then revert to the last commit before the diet effort started, and continue from there.

davelab6 commented 4 years ago

a way for @m4rc1e to reproduce the filesize increase you found when adding the Yoruba characters that appears unique to Roboto statics

Ah, @rsheeter pointed out that the information required to do this is already on this thread, in 2 parts. I wrote,

For example, there are 6 characters missing from the Google Fonts "Core" Latin set latin_unique-glyphs.nam that are needed to support Yoruba:

0x0300 COMBINING GRAVE ACCENT 0x0301 COMBINING ACUTE ACCENT 0x0303 COMBINING TILDE 0x0304 COMBINING MACRON 0x0309 COMBINING HOOK ABOVE 0x0323 COMBINING DOT BELOW

and Rod wrote,

for family in Roboto OpenSans; do 
  pyftsubset ${family}-Regular.ttf \
  --unicodes="U+0061-0062,U+0x0300-0301" \
  --output-file=${family}-Regular_combtest.ttf; 
  ttx -o ${family}-Regular_combtest.ttx ${family}-Regular_combtest.ttf;
done

Rod also wrote earlier,

+6 codepoints in latin for Open Sans adds 4 gids (230 => 234) and increases woff2 filesize by 100-200 bytes (1-2%).

+6 codepoints in latin for Roboto adds 104 gids (248 => 352) and increases woff2 filesize by 2400-2800 bytes (17-22%).

The woff2 filesize is the real concern

So, to move forward, I propose that @sannorozco and @dberlow take a step back and research how is Open Sans constructed so that adding the 6 characters does not increase the filesize by ~20%, and apply Marc's minimal test case to Open Sans to confirm that it renders as expected.

Does that sound good?

dberlow commented 4 years ago

When i open Open Sans in Robofont, all the composites are there, and it has a MUCH SMALLER glyph repertoire than Roboto.

So,

Where are the files that show Roboto increasing by 20% from the addition of 6 glyphs? who made them? How?

Santiago has dieted, and remapped the glyph indexes and it going to run a test to make sure all the glyphs show up this time.

And, reverting to the pre-diet branch is going to lose the last few weeks of Mike's efforts because we stopped optimizing on the pre-diet version as it was being discarded as an option.

Our way to a complete source now is to replicate the composites in the current version, add the hints for those composites back in, and go forward from there?

The good news is, we can now rearrange the glyph indexes again if we need to and that's open source.

Let us know, thanks.

On Thu, Mar 26, 2020 at 11:32 AM Dave Crossland notifications@github.com wrote:

a way for @m4rc1e https://github.com/m4rc1e to reproduce the filesize increase you found when adding the Yoruba characters that appears unique to Roboto statics

Ah, @rsheeter https://github.com/rsheeter pointed out that the information required to do this is already on this thread, in 2 parts. I wrote,

For example, there are 6 characters missing from the Google Fonts "Core" Latin set latin_unique-glyphs.nam https://github.com/googlefonts/gftools/blob/master/Lib/gftools/encodings/latin_unique-glyphs.nam that are needed to support Yoruba:

0x0300 COMBINING GRAVE ACCENT 0x0301 COMBINING ACUTE ACCENT 0x0303 COMBINING TILDE 0x0304 COMBINING MACRON 0x0309 COMBINING HOOK ABOVE 0x0323 COMBINING DOT BELOW

and Rod wrote,

for family in Roboto OpenSans; do pyftsubset ${family}-Regular.ttf \ --unicodes="U+0061-0062,U+0x0300-0301" \ --output-file=${family}-Regular_combtest.ttf; ttx -o ${family}-Regular_combtest.ttx ${family}-Regular_combtest.ttf; done

Rod also wrote earlier,

+6 codepoints in latin for Open Sans adds 4 gids (230 => 234) and increases woff2 filesize by 100-200 bytes (1-2%).

+6 codepoints in latin for Roboto adds 104 gids (248 => 352) and increases woff2 filesize by 2400-2800 bytes (17-22%).

The woff2 filesize is the real concern

So, to move forward, I propose that @sannorozco https://github.com/sannorozco and @dberlow https://github.com/dberlow take a step back and research how is Open Sans constructed so that adding the 6 characters does not increase the filesize by ~20%, and apply Marc's minimal test case to Open Sans to confirm that it renders as expected.

Does that sound good?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TypeNetwork/Roboto/issues/46#issuecomment-604497986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAO5VDWSL56L6S4XHPOQRVLRJNYPVANCNFSM4K3QTFVQ .

dberlow commented 4 years ago

Enclosed is a compare of the two repertoires.

On Thu, Mar 26, 2020 at 4:56 PM David Berlow dberlow@fontbureau.com wrote:

When i open Open Sans in Robofont, all the composites are there, and it has a MUCH SMALLER glyph repertoire than Roboto.

So,

Where are the files that show Roboto increasing by 20% from the addition of 6 glyphs? who made them? How?

Santiago has dieted, and remapped the glyph indexes and it going to run a test to make sure all the glyphs show up this time.

And, reverting to the pre-diet branch is going to lose the last few weeks of Mike's efforts because we stopped optimizing on the pre-diet version as it was being discarded as an option.

Our way to a complete source now is to replicate the composites in the current version, add the hints for those composites back in, and go forward from there?

The good news is, we can now rearrange the glyph indexes again if we need to and that's open source.

Let us know, thanks.

On Thu, Mar 26, 2020 at 11:32 AM Dave Crossland notifications@github.com wrote:

a way for @m4rc1e https://github.com/m4rc1e to reproduce the filesize increase you found when adding the Yoruba characters that appears unique to Roboto statics

Ah, @rsheeter https://github.com/rsheeter pointed out that the information required to do this is already on this thread, in 2 parts. I wrote,

For example, there are 6 characters missing from the Google Fonts "Core" Latin set latin_unique-glyphs.nam https://github.com/googlefonts/gftools/blob/master/Lib/gftools/encodings/latin_unique-glyphs.nam that are needed to support Yoruba:

0x0300 COMBINING GRAVE ACCENT 0x0301 COMBINING ACUTE ACCENT 0x0303 COMBINING TILDE 0x0304 COMBINING MACRON 0x0309 COMBINING HOOK ABOVE 0x0323 COMBINING DOT BELOW

and Rod wrote,

for family in Roboto OpenSans; do pyftsubset ${family}-Regular.ttf \ --unicodes="U+0061-0062,U+0x0300-0301" \ --output-file=${family}-Regular_combtest.ttf; ttx -o ${family}-Regular_combtest.ttx ${family}-Regular_combtest.ttf; done

Rod also wrote earlier,

+6 codepoints in latin for Open Sans adds 4 gids (230 => 234) and increases woff2 filesize by 100-200 bytes (1-2%).

+6 codepoints in latin for Roboto adds 104 gids (248 => 352) and increases woff2 filesize by 2400-2800 bytes (17-22%).

The woff2 filesize is the real concern

So, to move forward, I propose that @sannorozco https://github.com/sannorozco and @dberlow https://github.com/dberlow take a step back and research how is Open Sans constructed so that adding the 6 characters does not increase the filesize by ~20%, and apply Marc's minimal test case to Open Sans to confirm that it renders as expected.

Does that sound good?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TypeNetwork/Roboto/issues/46#issuecomment-604497986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAO5VDWSL56L6S4XHPOQRVLRJNYPVANCNFSM4K3QTFVQ .

davelab6 commented 4 years ago

Enclosed is a compare of the two repertoires.

You can not attach files via email, as you tried here - you must visit the https://github.com/TypeNetwork/Roboto/issues/46 page and drag and drop the files into your comment.

davelab6 commented 4 years ago

Where are the files that show Roboto increasing by 20% from the addition of 6 glyphs? who made them? How?

The files were not shared, but the command line to reproduce them has been shared, so I am requesting that you/santiago (re)make them, so that you can be sure how they were made and fully investigate and compare.

go forward from there?

Understood. Let's roll forwards!

The good news is, we can now rearrange the glyph indexes again if we need to and that's open source.

After looking at the code (https://github.com/TypeNetwork/Roboto/blob/delivery-review/Scripts/mapper-VTT-gids.py) then this needs to be polished and packaged in order to be used again; I'll file a separate issue for that :)

dberlow commented 4 years ago

Where is,the command line please?

Glyph repertoires below? Open Sans is top, Roboto below.

Screen Shot 2020-03-26 at 4 30 13 PM Screen Shot 2020-03-26 at 4 30 36 PM
davelab6 commented 4 years ago

I explain how to construct the command here, https://github.com/TypeNetwork/Roboto/issues/46#issuecomment-604497986

m4rc1e commented 4 years ago

The original Roboto has a dedicated webfont family which is produced by post processing the master fonts. This is the version of Roboto we use on Google Fonts.

If I run the post processing scripts which creates this family on our VFs, we get a file size of 900kb. The static ttfs we currently serve are 2.1mb. Imo, this is a massive win.

The VFs also contain Mike's VTT work.

cc @rsheeter @davelab6