rosettatype / hyperglot

Hyperglot: a database and tools for detecting language support in fonts
http://hyperglot.rosettatype.com
GNU General Public License v3.0
165 stars 23 forks source link

Crash when attempting to output YAML #16

Closed ctrlcctrlv closed 3 years ago

ctrlcctrlv commented 3 years ago
[fred@laptop ~]$ hyperglot Workspace/chomsky/dist/Chomsky.otf -o /tmp/langs.yml

=================================
Chomsky.otf has base support for:
=================================

286 languages of Latin script:
------------------------------
Arbëreshë Albanian, Eastern Abnaki, Afar, Arvanitika Albanian, Western Abnaki, Achinese, Achuar-Shiwiar, Acheron, Eastern Arrernte, Afrikaans, Aguaruna, Gheg Albanian, Tosk Albanian, Amahuaca, Yanesha', Amis, Amarakaeri, Uab Meto, Aragonese, Mapudungun, Asu (Tanzania), Waorani, Anuta, Southern Aymara, Central Aymara, Bemba (Zambia), Bena (Tanzania), Bikol, Bislama, Bosnian, Breton, Garifuna, Catalan, Chachi, Chavacano, Cashibo-Cacataibo, Cashinahua, Candoshi-Shapra, Cebuano, Czech, Chiga, Chamorro, Ojitlán Chinantec, Chuukese, Cimbrian, Chokwe, Central Kurdish, Mandarin Chinese, Asháninka, Montenegrin, Cofán, Cornish, Corsican, Caquinte, Pichis Ashéninka, Crimean Tatar, Seselwa Creole French, Chiltepec Chinantec, Kashubian, Tedim Chin, Welsh, Danish, Taita, German, Dehu, Dimli, Lower Sorbian, Embu, Standard Estonian, English, Ese Ejja, Basque, Faroese, Fijian, Filipino, Finnish, Kven Finnish, French, Western Frisian, Friulian, Gan, Borana-Arsi-Guji Oromo, West Central Oromo, Guadeloupean Creole French, Gilbertese, Scottish Gaelic, Irish, Galician, Manx, Gooniyandi, Swiss German, Wayuu, Gusii, Gwichʼin, Hän, Eastern Oromo, Hakka Chinese, Haitian, Hawaiian, Northern Qiandong Miao, Hiligaynon, Southern Qiandong Miao, Hani, Caribbean Hindustani, Hopi, Croatian, Upper Sorbian, Hungarian, Huastec, Iloko, Indonesian, Icelandic, Italian, Jamaican Creole English, Javanese, Shuar, Japanese, Kara-Kalpak, Kalaallisut, Kamba (Kenya), Makonde, Kabuverdianu, Kekchí, Kaingang, Khasi, Kinyarwanda, Kirmanjki, Kalenjin, Kimbundu, Kongo, Konzo, Kaonde, Karelian, Shambala, Kölsch, Kituba (DRC), Kuanyama, Ladino, Latin, Ligurian, Lithuanian, Ladin, Lombard, Latgalian, Luxembourgish, Luba-Lulua, Luo (Kenya and Tanzania), Standard Latvian, Mam, Matsés, Meru, Mauritian Creole, Makhuwa-Meetto, Minangkabau, Mískito, Malagasy, Montagnais, Mohawk, Maori, Creek, Murrinh-Patha, Mirandese, Kala Lagaw Ya, Ixcatlán Mazatec, Naga Pidgin, Neapolitan, Navajo, South Ndebele, North Ndebele, Ndonga, Low German, Central Nahuatl, Niuean, Ao Naga, Dutch, Norwegian, Nomatsiguenga, Pedi, Nyankole, Occitan, Northwestern Ojibwa, Orma, Oroqen, Pampanga, Papiamento, Palauan, Páez, Picard, Pijin, Pintupi-Luritja, Paluan, Piemontese, Polish, Pohnpeian, Portuguese, Potawatomi, Upper Guinea Crioulo, Pipil, Ashéninka Perené, K'iche', Quechua, Cook Islands Māori, Balkan Romani, Vlax Romani, Romansh, Rotokas, Rundi, Rwa, Sango, Samburu, Sangu (Tanzania), Sicilian, Sena, Seri, Shipibo-Conibo, Shawnee, Slovak, Slovenian, Southern Sami, Samoan, Shona, Soninke, Somali, Southern Sotho, Spanish, Sardinian, Saramaccan, Sranan Tongo, Swati, Sundanese, Maore Comorian, Congo Swahili, Swedish, Swahili, Silesian, Tahitian, Tetun Dili, Tetum, Tagalog, Tiv, Tokelau, Toba, Tonga (Zambia), Tonga (Tonga Islands), Papantla Totonac, Tok Pisin, Tswana, Tsonga, Purepecha, Tumbuka, Tuvalu, Tzeltal, Tzotzil, Meriam Mir, Umbundu, Munsee, Northern Uzbek, Venetian, Veps, Makhuwa, Võro, Walser, Waray (Philippines), Warlpiri, Wik-Mungkan, Ho-Chunk, Walloon, Mwani, Wiradjuri, Wangaaybuwan-Ngiyambaa, Xavánte, Xhosa, Soga, Minang, Yao, Yapese, Yindjibarndi, Makwe, Yucateco, Zapotec, Ngazidja Comorian, Malaysian, Záparo, Standard Malay, Zulu, Zuni

286 languages supported in total.

Traceback (most recent call last):
  File "/home/fred/.local/bin/hyperglot", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/fred/.local/lib/python3.9/site-packages/hyperglot/main.py", line 301, in cli
    write_yaml(output, data)
  File "/home/fred/.local/lib/python3.9/site-packages/hyperglot/main.py", line 153, in write_yaml
    languages = {iso: dict(l) for iso, l in languages.items()}
  File "/home/fred/.local/lib/python3.9/site-packages/hyperglot/main.py", line 153, in <dictcomp>
    languages = {iso: dict(l) for iso, l in languages.items()}
ValueError: dictionary update sequence element #0 has length 1; 2 is required
ctrlcctrlv commented 3 years ago

This patch gave me usable output.

diff --git a/lib/hyperglot/main.py b/lib/hyperglot/main.py
index 02df9c6..2a61618 100644
--- a/lib/hyperglot/main.py
+++ b/lib/hyperglot/main.py
@@ -149,9 +149,8 @@ def write_yaml(file, data):
             for script, languages in langs_by_status.items():
                 if path not in write:
                     write[path] = {}
-                # Coerce l back  to dict from type Language
-                languages = {iso: dict(l) for iso, l in languages.items()}
-                write[path].update(languages)
+                languages = dict(languages)
+                write[path].update({script: languages})
     if len(data.keys()) == 1:
         # Single file input, write directly to top level by re-writing the
         # output dict without the filename level

Looks like:

...
eng:
  name: English
  orthographies:
  - autonym: English
    auxiliary: Á Ç È É Ê Ë Ï Ñ Ô Ö á ç è é ê ë ï ñ ô ö
    base: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Œ a b c d e f g h i j k l m n o p q r s t u v w x y z æ œ
    script: Latin
    status: primary
  source:
  - Omniglot
  - Wikipedia
  - CLDR
  - Alvestrand
  speakers: 400000000
  speakers_date: 2006
  status: living
  validity: preliminary
ese:
  name: Ese Ejja
  orthographies:
  - base: B P D T K S J C M N Y W Ñ b p d t k s j c m n y w ñ
    marks: ̃
    script: Latin
    status: primary
  source:
  - Wikipedia
  speakers: 700
  speakers_date: 2007
  validity: draft
eus:
  name: Basque
  orthographies:
  - autonym: Euskara
    base: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ñ Ü a b c d e f g h i j k l m n o p q r s t u v w x y z ñ ü
    marks: ̃  ̈
    script: Latin
    status: primary
  source:
  - Omniglot
  - Wikipedia
  speakers: 750000
  speakers_date: 2016
  status: living
  validity: verified
fao:
  name: Faroese
  orthographies:
  - autonym: Føroyskt
    base: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Á Å Æ Í Ð Ó Ø Ú Ý a b c d e f g h i j k l m n o p q r s t u v w x y z á å æ í ð ó ø ú ý
    marks: ́  ̊
    script: Latin
    status: primary
  source:
  - Omniglot
  - Wikipedia
  speakers: 72000
  speakers_date: 2007
  status: living
  validity: verified
fij:
  name: Fijian
  orthographies:
  - autonym: Vakaviti
    base: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
    script: Latin
    status: primary
  source:
  - Omniglot
  - Wikipedia
  speakers: 339210
  speakers_date: 1996
  status: living
  validity: verified
...

No idea what the intended format was, but this works for me.

kontur commented 3 years ago

Thanks, I'll take a look at this shortly.

kontur commented 3 years ago

Thanks again for the report. We had previously saved the yaml data split by support level (base, aux) and there was left over code that broke the saving. If you update the package now with pip install --upgrade hyperglot it should output correctly.