roedoejet / g2p

Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
https://g2p-studio.herokuapp.com
Other
133 stars 27 forks source link

LANGS_AVAILABLE is not complete #70

Closed joanise closed 3 years ago

joanise commented 4 years ago

This is an issue at the intersection of g2p and ReadAlongs/Studio: the variable LANGS_AVAILABLE in g2p.mappings.langs does not include all languages available for mapping.

Currently, the list, obtained by calling readalongs align -h or by giving an invalid language code to the -l option, is:

alq, atj, ckt, crj, crk, crl, crm, csw, ctp, dan, fra, git, gla, iku, kkz, lml, moh, oji, see, srs, str, tce, tgx, tli, und, win, eng

but the full list, if we ignore the *-ipa instances, is:

alq, atj, ckt, crg-tmd, crg-dv, crj, crj-norm, crk-no-symbols, crk, crl, crl-norm, crm, crm-norm, csw, csw-norm, ctp, dan, fra, git, git, git, gla, iku, kkz, kwk-napa, kwk-umista, kwk-umista, kwk-boas, lml, moh, moh, oji, oji-syl, see, srs, str, tce, tce-norm, tgx, tli, tli-norm, und, win, eng

Languages currently missing:

The -norm and -no-symbols probably don't belong in the list, since (I believe) they are intermediate representations, but the others need to be included since a user might have them as input language to create a read along.

Initial patch proposal, used to create the extended list above:

LANGS_AVAILABLE = [{mapping['in_lang']: mapping['language_name']} for k, v in LANGS.items() for mapping in v['mappings'] if not mapping['in_lang'].endswith("-ipa")]

compare with the current code:

LANGS_AVAILABLE = [{k: v['language_name']} for k, v in LANGS.items() if k not in ['generated', 'font-encodings']]
joanise commented 3 years ago

Fixed in Studio instead of here, since that's where this issue causes problems.

See https://github.com/ReadAlongs/Studio/pull/41

roedoejet commented 3 years ago

Fixed in Studio instead of here, since that's where this issue causes problems.

See ReadAlongs/Studio#41

Shall we close this here then?

joanise commented 3 years ago

Once we're all happy with the PR, then we can close this issue here. Not before, please!