Provide more "how to" documentation for python library use

kontur commented 2 years ago

As per #28 and #86 — the library is useful for the CLI, but without documentation it is not useful standalone.

kontur commented 2 years ago

And (for later inclusion in an aggregated list of examples) "How to get languages/language counts by validity":

from hyperglot import VALIDITYLEVELS
from hyperglot.languages import Languages

counts = {level: [] for level in VALIDITYLEVELS}

for iso, language in Languages(validity=VALIDITYLEVELS[0]).items():
    counts[language["validity"]].append(iso)

print({level: len(isos) for level, isos in counts.items()})

kontur commented 2 years ago

And "How many scripts are in the Hyperglot data" (all validity levels, all orthographies):

from hyperglot import VALIDITYLEVELS
from hyperglot.languages import Languages
from hyperglot.language import Language

scripts = []

for iso, language in Languages(validity=VALIDITYLEVELS[0]).items():
    l = Language(language, iso)
    if "orthographies" in l:
        scripts.extend([o["script"] for o in l["orthographies"]])

print(len(set(scripts)), sorted(set(scripts)))

kontur commented 1 year ago

To document: 0.4.2 now has the destinction between accessing the raw yaml data of a language, e.g.:

from hyperglot.languages import Languages
hg = Languages()

# the raw yaml for 'eng'
hg["eng"]

# a ready to use hyperglot.language.Language object
hg.eng

This is a lot more convenient than having to initialize Language objects with Language(Languages()["xxx"], "xxx").

ivangrozny commented 1 month ago

Hello @kontur , it is exactly what I need but with the font checker, I dont manage to use it with python...

    from hyperglot import checker
    check = checker.FontChecker("fontFile.otf")
    l = check.get_supported_languages(report_missing=10) # Contain allways every languages ...

    print("fr ", check.supports_language('fra'))
    print("jp ", check.supports_language('jpn')) # return always True ...

kontur commented 1 month ago

Hey @ivangrozny!

What is data?

FontChecker expects a path to a font as parameter. It can perform checks on font shaping, e.g. for Arabic. If you are interested in checking only against a set of characters, use CharsetChecker instead.

ivangrozny commented 1 month ago

Oh right, it's working with a font file path, I was giving a ttFont object... By the way is it possible to build a FontChecker with a ttFont ? because I get error with some font files :

File "gui.py", line 256, in load_new_font
    typo = check.get_supported_languages()
  File "hyperglot\checker.py", line 406, in get_supported_languages
    return super().get_supported_languages(**kwargs)
  File "hyperglot\checker.py", line 124, in get_supported_languages
    lang_sup = self.supports_language(
  File "hyperglot\checker.py", line 412, in supports_language
    return super().supports_language(iso, **kwargs)
  File "hyperglot\checker.py", line 277, in supports_language
    joining_errors, mark_errors = self._check_shaping(
  File "hyperglot\checker.py", line 384, in _check_shaping
    mark_errors = orthography.check_mark_attachment(check_attachment, self.shaper)
  File "hyperglot\orthography.py", line 223, in check_mark_attachment
    if shaper.check_mark_attachment(c) is False:
  File "hyperglot\shaper.py", line 221, in check_mark_attachment
    names = ", ".join(self.names_for_codepoints(missing_from_font))
TypeError: sequence item 0: expected str instance, NoneType found

for instance with Roboto Black from google font

kontur commented 1 month ago

@ivangrozny thanks for submitting that bug, it should totally be possible. If you pull in the latest dev it should no longer crash with this issue.

rosettatype / hyperglot

Provide more "how to" documentation for python library use #87