jmsv / ety-python

A Python module to discover the etymology of words
http://ety-python.rtfd.io
MIT License
144 stars 18 forks source link

Add emoji flags to languages #30

Open jmsv opened 6 years ago

jmsv commented 6 years ago

As mentioned on #25

Command line interface could have an -e flag for displaying relevant emojis alongside languages and maybe words

Very low priority feature, but might be fun to implement and use

alxwrd commented 6 years ago

Reference: in Unicode the characters used to display flags are the code point for the capital letter, plus 127397. [source]

So chr(ord("A") + 127397) = 🇦

>>> chr(ord("G") + 127397) + chr(ord("B") + 127397)
'🇬🇧'
>>> chr(ord("F") + 127397) + chr(ord("R") + 127397)
'🇫🇷'

The iso-639-3.json already has these country codes, so could be a feature of the Langauge class.

>>> import ety
>>> fr = ety.Language("fra")
>>> fr.emoji
'🇫🇷'

  {
    "name": "French",
    "type": "living",
    "scope": "individual",
    "iso6393": "fra",
    "iso6392B": "fre",
    "iso6392T": "fra",
    "iso6391": "fr"
  }
hugovk commented 5 years ago

Adding 127397 to each code letter is a neat trick (I'd never seen it before!), but there's a bit of a problem here.

ISO 3166-1 alpha-2 is for countries, and is the code used for mapping flags.

ISO 639 is for languages.

It's fine for many, like French, but there are a few which don't have the same code in each ISO.

And it's a bit tricky to choose a flag for a language, as some countries use many languages, and some languages are used by many countries. (This is also a problem in UX, see for example http://www.flagsarenotlanguages.com/blog/why-flags-do-not-represent-language/)

Demo

iso-639-3.json now only contains 3-char language codes (ISO 639-3, eg. "fra") and no longer contains the 2-char codes (ISO 639-2, eg. "fr"), so using the pycountry library (pip install pycountry) to get the ISO 639-2 from the ISO 639-3, and then assume it's ISO 3166-1 alpha-2 and returning the flag:

import pycountry
...
class Language(object):
...
    @property
    def emoji(self):
        try:
            alpha_2 = pycountry.languages.get(alpha_3=self.iso).alpha_2.upper()
            print(alpha_2)
            return chr(ord(alpha_2[0]) + 127397) + chr(ord(alpha_2[1]) + 127397)
        except AttributeError:
            return None

Then running this:

import ety
from ety.data import langs

for code in langs:
    lang = ety.Language(code)
    if lang.emoji is not None:
        print(lang.emoji, lang)

Gives:

🇦🇦 Afar
🇦🇧 Abkhazian
🇦🇫 Afrikaans
🇦🇰 Akan
🇦🇲 Amharic
🇦🇷 Arabic
🇦🇳 Aragonese
🇦🇸 Assamese
🇦🇻 Avaric
🇦🇪 Avestan
🇦🇾 Aymara
🇦🇿 Azerbaijani
🇧🇦 Bashkir
🇧🇲 Bambara
🇧🇪 Belarusian
🇧🇳 Bengali
🇧🇮 Bislama
🇧🇴 Tibetan
🇧🇸 Bosnian
🇧🇷 Breton
🇧🇬 Bulgarian
🇨🇦 Catalan
🇨🇸 Czech
🇨🇭 Chamorro
🇨🇪 Chechen
🇨🇺 Church Slavic
🇨🇻 Chuvash
🇰🇼 Cornish
🇨🇴 Corsican
🇨🇷 Cree
🇨🇾 Welsh
🇩🇦 Danish
🇩🇪 German
🇩🇻 Dhivehi
🇩🇿 Dzongkha
🇪🇱 Modern Greek (1453-)
🇪🇳 English
🇪🇴 Esperanto
🇪🇹 Estonian
🇪🇺 Basque
🇪🇪 Ewe
🇫🇴 Faroese
🇫🇦 Persian
🇫🇯 Fijian
🇫🇮 Finnish
🇫🇷 French
🇫🇾 Western Frisian
🇫🇫 Fulah
🇬🇩 Scottish Gaelic
🇬🇦 Irish
🇬🇱 Galician
🇬🇻 Manx
🇬🇳 Guarani
🇬🇺 Gujarati
🇭🇹 Haitian
🇭🇦 Hausa
🇸🇭 Serbo-Croatian
🇭🇪 Hebrew
🇭🇿 Herero
🇭🇮 Hindi
🇭🇴 Hiri Motu
🇭🇷 Croatian
🇭🇺 Hungarian
🇭🇾 Armenian
🇮🇬 Igbo
🇮🇴 Ido
🇮🇮 Sichuan Yi
🇮🇺 Inuktitut
🇮🇪 Interlingue
🇮🇦 Interlingua (International Auxiliary Language Association)
🇮🇩 Indonesian
🇮🇰 Inupiaq
🇮🇸 Icelandic
🇮🇹 Italian
🇯🇻 Javanese
🇯🇦 Japanese
🇰🇱 Kalaallisut
🇰🇳 Kannada
🇰🇸 Kashmiri
🇰🇦 Georgian
🇰🇷 Kanuri
🇰🇰 Kazakh
🇰🇲 Khmer
🇰🇮 Kikuyu
🇷🇼 Kinyarwanda
🇰🇾 Kirghiz
🇰🇻 Komi
🇰🇬 Kongo
🇰🇴 Korean
🇰🇯 Kuanyama
🇰🇺 Kurdish
🇱🇴 Lao
🇱🇦 Latin
🇱🇻 Latvian
🇱🇮 Limburgan
🇱🇳 Lingala
🇱🇹 Lithuanian
🇱🇧 Luxembourgish
🇱🇺 Luba-Katanga
🇱🇬 Ganda
🇲🇭 Marshallese
🇲🇱 Malayalam
🇲🇷 Marathi
🇲🇰 Macedonian
🇲🇬 Malagasy
🇲🇹 Maltese
🇲🇳 Mongolian
🇲🇮 Maori
🇲🇸 Malay (macrolanguage)
🇲🇾 Burmese
🇳🇦 Nauru
🇳🇻 Navajo
🇳🇷 South Ndebele
🇳🇩 North Ndebele
🇳🇬 Ndonga
🇳🇪 Nepali (macrolanguage)
🇳🇱 Dutch
🇳🇳 Norwegian Nynorsk
🇳🇧 Norwegian Bokmål
🇳🇴 Norwegian
🇳🇾 Nyanja
🇴🇨 Occitan (post 1500)
🇴🇯 Ojibwa
🇴🇷 Oriya (macrolanguage)
🇴🇲 Oromo
🇴🇸 Ossetian
🇵🇦 Panjabi
🇵🇮 Pali
🇵🇱 Polish
🇵🇹 Portuguese
🇵🇸 Pushto
🇶🇺 Quechua
🇷🇲 Romansh
🇷🇴 Romanian
🇷🇳 Rundi
🇷🇺 Russian
🇸🇬 Sango
🇸🇦 Sanskrit
🇸🇮 Sinhala
🇸🇰 Slovak
🇸🇱 Slovenian
🇸🇪 Northern Sami
🇸🇲 Samoan
🇸🇳 Shona
🇸🇩 Sindhi
🇸🇴 Somali
🇸🇹 Southern Sotho
🇪🇸 Spanish
🇸🇶 Albanian
🇸🇨 Sardinian
🇸🇷 Serbian
🇸🇸 Swati
🇸🇺 Sundanese
🇸🇼 Swahili (macrolanguage)
🇸🇻 Swedish
🇹🇾 Tahitian
🇹🇦 Tamil
🇹🇹 Tatar
🇹🇪 Telugu
🇹🇬 Tajik
🇹🇱 Tagalog
🇹🇭 Thai
🇹🇮 Tigrinya
🇹🇴 Tonga (Tonga Islands)
🇹🇳 Tswana
🇹🇸 Tsonga
🇹🇰 Turkmen
🇹🇷 Turkish
🇹🇼 Twi
🇺🇬 Uighur
🇺🇰 Ukrainian
🇺🇷 Urdu
🇺🇿 Uzbek
🇻🇪 Venda
🇻🇮 Vietnamese
🇻🇴 Volapük
🇼🇦 Walloon
🇼🇴 Wolof
🇽🇭 Xhosa
🇾🇮 Yiddish
🇾🇴 Yoruba
🇿🇦 Zhuang
🇿🇭 Chinese
🇿🇺 Zulu

Some clear mismatches:

🇦🇫 Afrikaans
🇦🇷 Arabic
🇧🇪 Belarusian
🇧🇷 Breton
🇨🇦 Catalan
🇨🇭 Chamorro
🇰🇼 Cornish
🇨🇾 Welsh
🇪🇪 Ewe
🇮🇪 Interlingue
🇸🇻 Swedish
jmsv commented 5 years ago

Are you sure they're mismatches? A few of those just look like they country code is derived from their native languages - I come from Devon, UK (next to Cornwall) so the first thing I noticed was that Cornish for 'Cornwall' is 'Kernow', which probably explains its 🇰🇼 code. Similarly, Welsh is 'Cymraeg' or something in Welsh - should explain its 🇨🇾 code.

Looking further into it, these seem to all be the two-char ISO 639-1 codes, rather than the three-char ISO 639-3 codes used by this library.

so tl;dr: your code looks good to me! feel free to PR it with a CLI arg to enable it!

hugovk commented 5 years ago

I'm sure they're mismatches. Languages != countries.

"🇰🇼 Cornish"

That is not the Cornish flag, it's the flag of Kuwait.

Language ISO 639-3 alpha-3 language code ISO 639-3 alpha-2 language code Flag
Cornish cor kw
Country ISO 3166-1 alpha-2 country code Flag
Kuwait KW

"🇨🇾 Welsh"

That is not the Welsh flag, it's the flag of Cyprus.

Language ISO 639-3 alpha-3 language code ISO 639-3 alpha-2 language code Flag
Welsh cym cy
Country ISO 3166-1 alpha-2 country code Flag
Cyprus CY

"🇸🇻 Swedish"

That is not the Swedish flag, it's the flag of El Salvador.

Language ISO 639-3 alpha-3 language code ISO 639-3 alpha-2 language code Flag
Swedish swe sv
Country ISO 3166-1 alpha-2 country code Flag
El Salvador SV
jmsv commented 5 years ago

Oops sorry my mistake, you're right - for some reason I'm seeing different things on different devices: Chrome on my laptop displays letters that seem to map to ISO 639-1s and on Chrome on my phone I can see the wrong flags you mentioned 🤔

Maybe there's a free dataset somewhere mapping ISO 639-3 codes to flag emojis we could use?