rosettatype / hyperglot

Hyperglot: a database and tools for detecting language support in fonts
http://hyperglot.rosettatype.com
GNU General Public License v3.0
162 stars 22 forks source link

KeyError: 'post' on some fonts that have the 'post' table #24

Closed cjchapman closed 3 years ago

cjchapman commented 3 years ago

I see crashes on some fonts (e.g. some of the Noto Fonts) with KeyError: post, e.g:

$ hyperglot NotoSansMalayalam-Regular.ttf 
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/fontTools/ttLib/ttFont.py", line 372, in __getitem__
    return self.tables[tag]
KeyError: 'post'

…even though the fonts in question have the post table, e.g.:

>>> font = TTFont('NotoSansMalayalam-Regular.ttf')
>>> font.keys()
['GlyphOrder', 'head', 'hhea', 'maxp', 'OS/2', 'hmtx', 'cmap', 'loca', 'glyf', 'name', 'post', 'gasp', 'GDEF', 'GPOS', 'GSUB']
kontur commented 3 years ago

Could it be this is an outdated or manipulated version of the font file?

Testing with this NotoSansMalayalam-Regular.ttf I get:

$ hyperglot ~/Desktop/NotoSansMalayalam-Regular.ttf 

===================================================
NotoSansMalayalam-Regular.ttf has base support for:
===================================================

1 language of Malayalam script:
-------------------------------
Malayalam

1 languages supported in total.

Feel free to attach your particular version of the font here and I'm happy to test with that. By the looks of the error, however, it seems like a fontTools problem relating to simply parsing the font. The Hyperglot CLI does not much else in fontTools than read in the cmap table, so no specific interaction with the post table at all.

cjchapman commented 3 years ago

It might well be out of date. Here's the version I was using: NotoSansMalayalam-Regular.ttf.zip

cjchapman commented 3 years ago

The version I posted above is the 1.04 unhinted version and the version you pointed to is the 2.001 hinted version of Noto Sans Malayalam. So, yes, out of date. However, I'm able to reproduce this KeyError even with fonts in my macOS Library/Fonts folder, e.g.:

$ hyperglot /Library/Fonts/Arial\ Unicode.ttf 
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/fontTools/ttLib/ttFont.py", line 372, in __getitem__
    return self.tables[tag]
KeyError: 'post'
MrBrezina commented 3 years ago

What is your version of fontTools?

cjchapman commented 3 years ago
$ pip3 freeze  | egrep "(fonttools|hyperglot)"
fonttools==4.21.1
hyperglot==0.2.9
cjchapman commented 3 years ago

I tried it again just now with the latest of both fonttools and hyperglot:

$ pip3 freeze | egrep "(fonttools|hyperglot)"
fonttools==4.21.1
hyperglot==0.2.11
$ hyperglot /Library/Fonts/Arial\ Unicode.ttf 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/fontTools/ttLib/ttFont.py", line 372, in __getitem__
    return self.tables[tag]
KeyError: 'post'
kontur commented 3 years ago

I'm inclined to escalate this to fontTools. Could you do two things:

Hyperglot uses fontTools for two very, very minor things only:

Checking a font can in fact be parsed in fontTools; this does not seem to be the root of your error, since it would exit the CLI command with the error message.

try:
            _font = TTFont(v, lazy=True)
            _font.close()
        except Exception as e:
            raise click.BadParameter("Could not convert TTFont from passed in "
                                     "font file (%s)" % str(e))

And then actually getting the unicode points in the font:

def parse_font_chars(path):
    """
    Open the provided font path and extract the codepoints encoded in the font
    @return list of characters
    """
    font = TTFont(path, lazy=True)
    cmap = font["cmap"]
    font.close()

    # The cmap keys are int codepoints
    return [chr(c) for c in cmap.getBestCmap().keys()]

Neither of these two instances of fontTools use does anything besides reading in the font, and accessing some basic parts of its contents.

As per my above suggestion, the only thing I could imagine is that some outdated version of fontTools lurks somewhere and is causing the issue, or otherwise this is something very specific to your machine, in which case I'd refer this to fontTools to figure out.

cjchapman commented 3 years ago

It's not specific to my machine. I can reproduce it on two other Macs. The problem does not occur with all fonts. It does appear to be a fonttools bug, as I was able to reproduce it with a small script based on code from hyperglot. I have written up a fonttools bug about this.

cjchapman commented 3 years ago

As Just pointed out in https://github.com/fonttools/fonttools/issues/2250#issuecomment-810500604, it's not a fonttools bug. Based on his suggestion, I tried changing https://github.com/rosettatype/hyperglot/blob/d15e7d168ed288f4ac1c7cc0a7fa62587ee471cf/lib/hyperglot/parse.py#L176-L186 to:

def parse_font_chars(path):
    """
    Open the provided font path and extract the codepoints encoded in the font
    @return list of characters
    """
    with TTFont(path, lazy=True) as font:
        cmap = font["cmap"]
        # The cmap keys are int codepoints
        return [chr(c) for c in cmap.getBestCmap().keys()]

and now I get:

$ hyperglot NotoSansMalayalam-Regular.ttf 

===================================================
NotoSansMalayalam-Regular.ttf has base support for:
===================================================

1 language of Malayalam script:
-------------------------------
Malayalam

1 languages supported in total.
cjchapman commented 3 years ago

Here's a git patch file with the change: 0001-changed-parse_font_chars-to-not-close-the-font-befor.patch.zip

kontur commented 3 years ago

Super, thanks for digging deeper into this. Yes, that makes sense. I should have stored the result of getBestCmap(), not the cmap table, before closing the font; or not closed the font explicitly at all.

Will ship a fix for this in the next update 👍

cjchapman commented 3 years ago

You're welcome. 😀