Open kontur opened 2 years ago
And (for later inclusion in an aggregated list of examples) "How to get languages/language counts by validity":
from hyperglot import VALIDITYLEVELS
from hyperglot.languages import Languages
counts = {level: [] for level in VALIDITYLEVELS}
for iso, language in Languages(validity=VALIDITYLEVELS[0]).items():
counts[language["validity"]].append(iso)
print({level: len(isos) for level, isos in counts.items()})
And "How many scripts are in the Hyperglot data" (all validity levels, all orthographies):
from hyperglot import VALIDITYLEVELS
from hyperglot.languages import Languages
from hyperglot.language import Language
scripts = []
for iso, language in Languages(validity=VALIDITYLEVELS[0]).items():
l = Language(language, iso)
if "orthographies" in l:
scripts.extend([o["script"] for o in l["orthographies"]])
print(len(set(scripts)), sorted(set(scripts)))
To document: 0.4.2
now has the destinction between accessing the raw yaml data of a language, e.g.:
from hyperglot.languages import Languages
hg = Languages()
# the raw yaml for 'eng'
hg["eng"]
# a ready to use hyperglot.language.Language object
hg.eng
This is a lot more convenient than having to initialize Language objects with Language(Languages()["xxx"], "xxx")
.
Hello @kontur , it is exactly what I need but with the font checker, I dont manage to use it with python...
from hyperglot import checker
check = checker.FontChecker("fontFile.otf")
l = check.get_supported_languages(report_missing=10) # Contain allways every languages ...
print("fr ", check.supports_language('fra'))
print("jp ", check.supports_language('jpn')) # return always True ...
Hey @ivangrozny!
What is data
?
FontChecker expects a path to a font as parameter. It can perform checks on font shaping, e.g. for Arabic. If you are interested in checking only against a set of characters, use CharsetChecker instead.
Oh right, it's working with a font file path, I was giving a ttFont object... By the way is it possible to build a FontChecker with a ttFont ? because I get error with some font files :
File "gui.py", line 256, in load_new_font
typo = check.get_supported_languages()
File "hyperglot\checker.py", line 406, in get_supported_languages
return super().get_supported_languages(**kwargs)
File "hyperglot\checker.py", line 124, in get_supported_languages
lang_sup = self.supports_language(
File "hyperglot\checker.py", line 412, in supports_language
return super().supports_language(iso, **kwargs)
File "hyperglot\checker.py", line 277, in supports_language
joining_errors, mark_errors = self._check_shaping(
File "hyperglot\checker.py", line 384, in _check_shaping
mark_errors = orthography.check_mark_attachment(check_attachment, self.shaper)
File "hyperglot\orthography.py", line 223, in check_mark_attachment
if shaper.check_mark_attachment(c) is False:
File "hyperglot\shaper.py", line 221, in check_mark_attachment
names = ", ".join(self.names_for_codepoints(missing_from_font))
TypeError: sequence item 0: expected str instance, NoneType found
for instance with Roboto Black from google font
@ivangrozny thanks for submitting that bug, it should totally be possible. If you pull in the latest dev
it should no longer crash with this issue.
As per #28 and #86 — the library is useful for the CLI, but without documentation it is not useful standalone.