ssut / py-googletrans

(unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge.
http://py-googletrans.rtfd.io
MIT License
3.84k stars 712 forks source link

Current GT language list #408

Open StevanWhite opened 1 month ago

StevanWhite commented 1 month ago

Googletrans version:

I'm submitting a ...

Current behavior:

For many languages supported by Google Translate, translate fails, only because they are not listed in the constants.py file.

The LANGUAGES variable in constants.py lists 108 languages, while the current Google Translate page lists 243.

For instance

 phrase = 'How do you do?'
 res = translator.translate( phrase, dest='cy', src='en' )

throws an exception

File "/usr/local/lib/python3.12/site-packages/googletrans/client.py", line 200, in translate

ValueError: invalid destination language

Expected behavior:

The phrase to be translated would be passed on to Google Translate, which would attempt a translation.

Steps to reproduce:

Just try to translate an English phrase into one of the un-listed languages, say, Welsh ('cy'), as above

Related code:

This is very easy to fix. I'll make it even easier for you.

First, there are two lines in client.py, which screw up some of the language codes in current use by GT:

    dest = dest.lower().split('_', 1)[0]
    src = src.lower().split('_', 1)[0]

Remove these.

Copy-paste the following into constants.py. I just compiled it directly from the list in Google Translate.

    LANGUAGES = {
    'ab': 'Abkhaz',
    'ace': 'Acehnese',
    'ach': 'Acholi',
    'aa': 'Afar',
    'af': 'Afrikaans',
    'sq': 'Albanian',
    'alz': 'Alur',
    'am': 'Amharic',
    'ar': 'Arabic',
    'hy': 'Armenian',
    'as': 'Assamese',
    'av': 'Avar',
    'awa': 'Awadhi',
    'ay': 'Aymara',
    'az': 'Azerbaijani',
    'ban': 'Balinese',
    'bal': 'Baluchi',
    'bm': 'Bambara',
    'bci': 'Baoulé',
    'ba': 'Bashkir',
    'eu': 'Basque',
    'btx': 'Batak Karo',
    'bts': 'Batak Simalungun',
    'bbc': 'Batak Toba',
    'be': 'Belarusian',
    'bem': 'Bemba',
    'bn': 'Bengali',
    'bew': 'Betawi',
    'bho': 'Bhojpuri',
    'bik': 'Bikol',
    'bs': 'Bosnian',
    'br': 'Breton',
    'bg': 'Bulgarian',
    'bua': 'Buryat',
    'yue': 'Cantonese',
    'ca': 'Catalan',
    'ceb': 'Cebuano',
    'ch': 'Chamorro',
    'ce': 'Chechen',
    'ny': 'Chichewa',
    'zh-CN': 'Chinese (Simplified)',
    'zh-TW': 'Chinese (Traditional)',
    'chk': 'Chuukese',
    'cv': 'Chuvash',
    'co': 'Corsican',
    'crh': 'Crimean Tatar',
    'hr': 'Croatian',
    'cs': 'Czech',
    'da': 'Danish',
    'fa-AF': 'Dari',
    'dv': 'Dhivehi',
    'din': 'Dinka',
    'doi': 'Dogri',
    'dov': 'Dombe',
    'nl': 'Dutch',
    'dyu': 'Dyula',
    'dz': 'Dzongkha',
    'en': 'English',
    'eo': 'Esperanto',
    'et': 'Estonian',
    'ee': 'Ewe',
    'fo': 'Faroese',
    'fj': 'Fijian',
    'tl': 'Filipino',
    'fi': 'Finnish',
    'fon': 'Fon',
    'fr': 'French',
    'fy': 'Frisian',
    'fur': 'Friulian',
    'ff': 'Fulani',
    'gaa': 'Ga',
    'gl': 'Galician',
    'ka': 'Georgian',
    'de': 'German',
    'el': 'Greek',
    'gn': 'Guarani',
    'gu': 'Gujarati',
    'ht': 'Haitian Creole',
    'cnh': 'Hakha Chin',
    'ha': 'Hausa',
    'haw': 'Hawaiian',
    'iw': 'Hebrew',
    'hil': 'Hiligaynon',
    'hi': 'Hindi',
    'hmn': 'Hmong',
    'hu': 'Hungarian',
    'hrx': 'Hunsrik',
    'iba': 'Iban',
    'is': 'Icelandic',
    'ig': 'Igbo',
    'ilo': 'Ilocano',
    'id': 'Indonesian',
    'ga': 'Irish',
    'it': 'Italian',
    'jam': 'Jamaican Patois',
    'ja': 'Japanese',
    'jw': 'Javanese',
    'kac': 'Jingpo',
    'kl': 'Kalaallisut',
    'kn': 'Kannada',
    'kr': 'Kanuri',
    'pam': 'Kapampangan',
    'kk': 'Kazakh',
    'kha': 'Khasi',
    'km': 'Khmer',
    'cgg': 'Kiga',
    'kg': 'Kikongo',
    'rw': 'Kinyarwanda',
    'ktu': 'Kituba',
    'trp': 'Kokborok',
    'kv': 'Komi',
    'gom': 'Konkani',
    'ko': 'Korean',
    'kri': 'Krio',
    'ku': 'Kurdish (Kurmanji)',
    'ckb': 'Kurdish (Sorani)',
    'ky': 'Kyrgyz',
    'lo': 'Lao',
    'ltg': 'Latgalian',
    'la': 'Latin',
    'lv': 'Latvian',
    'lij': 'Ligurian',
    'li': 'Limburgish',
    'ln': 'Lingala',
    'lt': 'Lithuanian',
    'lmo': 'Lombard',
    'lg': 'Luganda',
    'luo': 'Luo',
    'lb': 'Luxembourgish',
    'mk': 'Macedonian',
    'mad': 'Madurese',
    'mai': 'Maithili',
    'mak': 'Makassar',
    'mg': 'Malagasy',
    'ms': 'Malay',
    'ms-Arab': 'Malay (Jawi)',
    'ml': 'Malayalam',
    'mt': 'Maltese',
    'mam': 'Mam',
    'gv': 'Manx',
    'mi': 'Maori',
    'mr': 'Marathi',
    'mh': 'Marshallese',
    'mwr': 'Marwadi',
    'mfe': 'Mauritian Creole',
    'chm': 'Meadow Mari',
    'mni-Mtei': 'Meiteilon (Manipuri)',
    'min': 'Minang',
    'lus': 'Mizo',
    'mn': 'Mongolian',
    'my': 'Myanmar (Burmese)',
    'nhe': 'Nahuatl (Eastern Huasteca)',
    'ndc-ZW': 'Ndau',
    'nr': 'Ndebele (South)',
    'new': 'Nepalbhasa (Newari)',
    'ne': 'Nepali',
    'bm-Nkoo': 'NKo',
    'no': 'Norwegian',
    'nus': 'Nuer',
    'oc': 'Occitan',
    'or': 'Odia (Oriya)',
    'om': 'Oromo',
    'os': 'Ossetian',
    'pag': 'Pangasinan',
    'pap': 'Papiamento',
    'ps': 'Pashto',
    'fa': 'Persian',
    'pl': 'Polish',
    'pt': 'Portuguese (Brazil)',
    'pt-PT': 'Portuguese (Portugal)',
    'pa': 'Punjabi (Gurmukhi)',
    'pa-Arab': 'Punjabi (Shahmukhi)',
    'qu': 'Quechua',
    'kek': 'Qʼeqchiʼ',
    'rom': 'Romani',
    'ro': 'Romanian',
    'rn': 'Rundi',
    'ru': 'Russian',
    'se': 'Sami (North)',
    'sm': 'Samoan',
    'sg': 'Sango',
    'sa': 'Sanskrit',
    'sat-Latn': 'Santali',
    'gd': 'Scots Gaelic',
    'nso': 'Sepedi',
    'sr': 'Serbian',
    'st': 'Sesotho',
    'crs': 'Seychellois Creole',
    'shn': 'Shan',
    'sn': 'Shona',
    'scn': 'Sicilian',
    'szl': 'Silesian',
    'sd': 'Sindhi',
    'si': 'Sinhala',
    'sk': 'Slovak',
    'sl': 'Slovenian',
    'so': 'Somali',
    'es': 'Spanish',
    'su': 'Sundanese',
    'sus': 'Susu',
    'sw': 'Swahili',
    'ss': 'Swati',
    'sv': 'Swedish',
    'ty': 'Tahitian',
    'tg': 'Tajik',
    'ber-Latn': 'Tamazight',
    'ber': 'Tamazight (Tifinagh)',
    'ta': 'Tamil',
    'tt': 'Tatar',
    'te': 'Telugu',
    'tet': 'Tetum',
    'th': 'Thai',
    'bo': 'Tibetan',
    'ti': 'Tigrinya',
    'tiv': 'Tiv',
    'tpi': 'Tok Pisin',
    'to': 'Tongan',
    'ts': 'Tsonga',
    'tn': 'Tswana',
    'tcy': 'Tulu',
    'tum': 'Tumbuka',
    'tr': 'Turkish',
    'tk': 'Turkmen',
    'tyv': 'Tuvan',
    'ak': 'Twi',
    'udm': 'Udmurt',
    'uk': 'Ukrainian',
    'ur': 'Urdu',
    'ug': 'Uyghur',
    'uz': 'Uzbek',
    've': 'Venda',
    'vec': 'Venetian',
    'vi': 'Vietnamese',
    'war': 'Waray',
    'cy': 'Welsh',
    'wo': 'Wolof',
    'xh': 'Xhosa',
    'sah': 'Yakut',
    'yi': 'Yiddish',
    'yo': 'Yoruba',
    'yua': 'Yucatec Maya',
    'zap': 'Zapotec',
    'zu': 'Zulu'
}

Other information:

StevanWhite commented 1 month ago

I have been testing this, and it isn't as simple as I had hoped.

While the changes allow translation into many languages that did not work before, many of the new languages do not get translated, although the GT webpage does do a translation. For instance, translate() doesn't translate English into Wolof (wo), although it does translate into Welsh (cy), and although the GT web app translates into Wolof.

I haven't discovered a pattern. Many languages with 3-letter codes, e.g. Crimean Tatar (crh) are not translated, but Cebuano (ceb) is translated. And both Simplified and Traditional Chinese (zh-CN and zh-TW) are translated.

I have seen the web page, too, sometimes fail to translate one phrase at all (leaves it in the original language), but with a minor change to the original, the phrase is translated. This is because the service translate.googleapis.com, accessed directly, is more restricted in the languages it translates, than the Google Translate web app.

StevanWhite commented 1 month ago

Correction.

Not all the languages in the list I provided before are supported (without subscription) by the service at translate.googleapis.com.

Here is the list of languages that are supported (free of restrictions) by that service, as of today.
Note it is still longer than the one in the last version of this package.

LANGUAGES = {
'af': 'Afrikaans',
'sq': 'Albanian',
'am': 'Amharic',
'ar': 'Arabic',
'hy': 'Armenian',
'as': 'Assamese',
'ay': 'Aymara',
'az': 'Azerbaijani',
'bm': 'Bambara',
'eu': 'Basque',
'be': 'Belarusian',
'bn': 'Bengali',
'bho': 'Bhojpuri',
'bs': 'Bosnian',
'bg': 'Bulgarian',
'ca': 'Catalan',
'ceb': 'Cebuano',
'ny': 'Chichewa',
'zh-CN': 'Chinese (Simplified)',
'zh-TW': 'Chinese (Traditional)',
'co': 'Corsican',
'hr': 'Croatian',
'cs': 'Czech',
'da': 'Danish',
'fa-AF': 'Dari',
'dv': 'Dhivehi',
'doi': 'Dogri',
'nl': 'Dutch',
'en': 'English',
'eo': 'Esperanto',
'et': 'Estonian',
'ee': 'Ewe',
'tl': 'Filipino',
'fi': 'Finnish',
'fr': 'French',
'fy': 'Frisian',
'gl': 'Galician',
'ka': 'Georgian',
'de': 'German',
'el': 'Greek',
'gn': 'Guarani',
'gu': 'Gujarati',
'ht': 'Haitian Creole',
'ha': 'Hausa',
'haw': 'Hawaiian',
'iw': 'Hebrew',
'hi': 'Hindi',
'hmn': 'Hmong',
'hu': 'Hungarian',
'is': 'Icelandic',
'ig': 'Igbo',
'ilo': 'Ilocano',
'id': 'Indonesian',
'ga': 'Irish',
'it': 'Italian',
'ja': 'Japanese',
'jw': 'Javanese',
'kn': 'Kannada',
'kk': 'Kazakh',
'km': 'Khmer',
'rw': 'Kinyarwanda',
'gom': 'Konkani',
'ko': 'Korean',
'kri': 'Krio',
'ku': 'Kurdish (Kurmanji)',
'ckb': 'Kurdish (Sorani)',
'ky': 'Kyrgyz',
'lo': 'Lao',
'la': 'Latin',
'lv': 'Latvian',
'ln': 'Lingala',
'lt': 'Lithuanian',
'lg': 'Luganda',
'lb': 'Luxembourgish',
'mk': 'Macedonian',
'mai': 'Maithili',
'mg': 'Malagasy',
'ms': 'Malay',
'ms-Arab': 'Malay (Jawi)',
'ml': 'Malayalam',
'mt': 'Maltese',
'mi': 'Maori',
'mr': 'Marathi',
'mni-Mtei': 'Meiteilon (Manipuri)',
'lus': 'Mizo',
'mn': 'Mongolian',
'my': 'Myanmar (Burmese)',
'ne': 'Nepali',
'bm-Nkoo': 'NKo',
'no': 'Norwegian',
'or': 'Odia (Oriya)',
'om': 'Oromo',
'ps': 'Pashto',
'fa': 'Persian',
'pl': 'Polish',
'pt': 'Portuguese (Brazil)',
'pt-PT': 'Portuguese (Portugal)',
'pa': 'Punjabi (Gurmukhi)',
'pa-Arab': 'Punjabi (Shahmukhi)',
'qu': 'Quechua',
'ro': 'Romanian',
'ru': 'Russian',
'sm': 'Samoan',
'sa': 'Sanskrit',
'gd': 'Scots Gaelic',
'nso': 'Sepedi',
'sr': 'Serbian',
'st': 'Sesotho',
'sn': 'Shona',
'sd': 'Sindhi',
'si': 'Sinhala',
'sk': 'Slovak',
'sl': 'Slovenian',
'so': 'Somali',
'es': 'Spanish',
'su': 'Sundanese',
'sw': 'Swahili',
'sv': 'Swedish',
'tg': 'Tajik',
'ta': 'Tamil',
'tt': 'Tatar',
'te': 'Telugu',
'th': 'Thai',
'ti': 'Tigrinya',
'ts': 'Tsonga',
'tr': 'Turkish',
'tk': 'Turkmen',
'ak': 'Twi',
'uk': 'Ukrainian',
'ur': 'Urdu',
'ug': 'Uyghur',
'uz': 'Uzbek',
'vi': 'Vietnamese',
'cy': 'Welsh',
'xh': 'Xhosa',
'yi': 'Yiddish',
'yo': 'Yoruba',
'zu': 'Zulu',
}