Open neurlang opened 6 days ago
Hindi (hin_deva) Bengali (ben_beng, ben_beng_dhaka, ben_beng_rarh) Punjabi (pan_guru) Telugu (tel_telu) Marathi (mar_deva) Tamil (tam_taml) Gujarati (guj_gujr) Urdu (urd_arab) Turkish (tur_latn) Vietnamese (vie_latn_hanoi, vie_latn_hue, vie_latn_saigon) Polish (pol_latn) Ukrainian (ukr_cyrl) Greek (ell_grek) Hungarian (hun_latn) Malay (msa_latn, msa_arab)
Afrikaans (afr) – Widely spoken in South Africa and Namibia. Amharic (amh) – Official language of Ethiopia. Azerbaijani (aze) – Spoken in Azerbaijan and parts of Iran. Belarusian (bel) – Spoken primarily in Belarus. Burmese (mya) – Official language of Myanmar. Chechen (che) – Spoken in Chechnya, a part of Russia. Cebuano (ceb) – Widely spoken in the Philippines. Chuvash (chv) – Spoken in Russia (Chuvash Republic). Danish (dan) – Official language of Denmark. Dzongkha (dzo) – Official language of Bhutan. Hausa (hau) – Spoken in Nigeria and surrounding countries. Hebrew (heb) – Official language of Israel. Indonesian (ind) – Official language of Indonesia. Javanese (jav) – Spoken on the island of Java, Indonesia. Kazakh (kaz) – Official language of Kazakhstan. Korean (kor) – Official language of both North and South Korea. Macedonian (mkd) – Official language of North Macedonia. Malayalam (mal) – Spoken in the Indian state of Kerala. Maltese (mlt) – Official language of Malta. Mongolian (mon) – Official language of Mongolia. Nepali (nep) – Official language of Nepal. Pashto (pus) – Spoken in Afghanistan and Pakistan. Somali (som) – Official language of Somalia. Thai (tha) – Official language of Thailand. Tibetan (bod) – Spoken in Tibet and neighboring areas. Uyghur (uig) – Spoken by the Uyghur people in Xinjiang, China. Zulu (zul) – Spoken in South Africa and surrounding areas.
for the record I will train this
lang | code | file |
---|---|---|
afrikaans | afr | ./afr_latn_narrow.tsv |
amharic | amh | ./amh_ethi_broad.tsv |
azerbaijani | aze | ./aze_latn_narrow_filtered.tsv |
belarusian | bel | ./bel_cyrl_narrow.tsv |
burmese | mya | ./mya_mymr_broad_filtered.tsv |
chechen | che | ./che_cyrl_broad.tsv |
cebuano | ceb | ./ceb_latn_narrow.tsv |
danish | dan | ./dan_latn_narrow.tsv |
dzongkha | dzo | ./dzo_tibt_broad.tsv |
hausa | hau | ./hau_latn_narrow.tsv |
hebrew | heb | ./heb_hebr_narrow.tsv |
indonesian | ind | ./ind_latn_narrow.tsv |
javanese | jav | ./jav_java_broad.tsv |
macedonian | mkd | ./mkd_cyrl_narrow.tsv |
malayalam | mal | ./mal_mlym_narrow.tsv |
maltese | mlt | ./mlt_latn_broad_filtered.tsv |
mongolian | mon | ./mon_cyrl_narrow.tsv |
nepali | nep | ./nep_deva_narrow.tsv |
pashto | pus | ./pus_arab_broad.tsv |
thai | tha | ./tha_thai_broad.tsv |
tibetan | bod | ./bod_tibt_broad.tsv |
uyghur | uig | ./uig_arab_broad.tsv |
zulu | zul | ./zul_latn_broad.tsv |
lang | code | no data |
---|---|---|
somali | som | no data |
chuvash | chv | no data |
data here:
https://github.com/CUNY-CL/wikipron/tree/master/data/scrape/tsv