refugee-phrasebook / backendscripts

1 stars 0 forks source link

source: add 2 lines expressing the language script code and language code #31

Open michal-fre opened 8 years ago

michal-fre commented 8 years ago

Parsing columns for languages will be improved by adding lines containing language-script-code and language-code

it needs to be discussed what codes to use as there are different systems. ISO 639-3 (3-letters) and ISO_15924 (up to 4 letters) are good candidates as dialects/different versions are catched

The information on what codes are used in a translation can be provided by the translators or manual lookup.

ToDo: how to handle phonetics: as for now i propose using Latn as scriptcode the ISO 639 + phon: fas_phon, prs_phon

example using ISO_15924 (scriptcode) and ISO 639 German | English | Arabic (farsi) | Dari | Dari-phonetics Latn | Latn | Arab | Arab | Latn ger | eng | fas | prs | prs_phon

ref: http://www-01.sil.org/iso639-3/scope.asp#M http://www.unicode.org/iso15924/iso15924-num.html http://www-01.sil.org/iso639-3/macrolanguages.asp

michal-fre commented 8 years ago

we can use this for hyphenation, font-switching, typographic fine-tuning, formating of localization (i.e. dates and numbers) and of course direction of writing

michal-fre commented 8 years ago

This is a quick list of languages used. duplicates are because of use of different conventions or added comments. Parsing the above-mentioned codes makes it possible to replace titles with variables in the future.

German --THIS IS BULGARIAN!!--Kurdish (Sorani) Albanian Amharic Amharic Phonetic Arabic Arabic (Syrian) Arabic / Syrian Phonetic Arabic(Fusha) Armenian Armenian 80% ready /proofreed needed Please mark in green Armenian phonetic Bangla Bangla / বাংলা Bangla Phonetic Bosnian / Croatian / Serbian Bosnian/Croatian/Serbian Bulgarian Croatian/Bosnian Czech Czech / Slovak Dari Dari Phonetic Dutch English Farsi Farsi Phonetic Farsi/Dari Filipino Finnish French German Greek alphabet Greek phonetic Hindi Hungarian Hungarian Icelandic Icons Italian Korean Kurdish (Kurmancî) Kurdish (Kurmanji) Kurdish (Sorani) Kurdish / (Sorani) Lithuanian Macedonian Macedonian Macedonian (preliminary! / has to be proofread!) Macedonian phonetical (PLEASE ADD !) Mandinka Norwegian / Danish Numbers for short section Pashto Pashto Phonetic Polish Polish Polish 1 Portuguese Romanian Russian Serbian Slovak Slovak / Czech Slovenian Somali Spanish Swedish Swedish / Norwegian / Danish Syrian phonetic/Fusha phonetics Syrian/Arabic alphabet Syrian/Arabic phonetic Tigrinya Turkish Twi Urdu Urdu Phonetic Vietnamese Vietnamese Woloff amharic mandarin / (chinese) አማርኛ Amharic

michal-fre commented 8 years ago

partially in 05_get_the_columns.sh - see https://github.com/refugee-phrasebook/backendscripts/issues/50

OUTPUT: deu ara ara_PHONETIC ara-ara_PHONETIC eng fra slv nld tir tir_PHONETIC tir-tir_PHONETIC amh amh_PHONETIC som som urd urd-PHONETIC urd-urd_PHONETIC ben ben_PHONETIC ben-ben_PHONETIC fas fas_PHONETIC prs prs_PHONETIC pus pus_PHONETIC sqi hbs pol mkd rus slk-ces sdh bul kmh swe-nor-dan fin isl ita tur spa hun por ell_PHONETIC ell ron hye hye_PHONETIC lit fil vie a a 1 deu 2 3 ara 4 ara_PHONETIC 5 ara-ara_PHONETIC 6 eng 7 fra 8 slv 9 nld 10 tir 11 tir_PHONETIC 12 tir-tir_PHONETIC 13 amh 14 amh_PHONETIC 15 som 16 som 17 urd 18 urd-PHONETIC 19 urd-urd_PHONETIC 20 ben 21 ben_PHONETIC 22 ben-ben_PHONETIC 23 fas 24 fas_PHONETIC 25 prs 26 prs_PHONETIC 27 pus 28 pus_PHONETIC 29 sqi 30 hbs 31 pol 32 mkd 33 rus 34 slk-ces 35 sdh 36 bul 37 kmh 38 swe-nor-dan 39 fin 40 isl 41 ita 42 tur 43 spa 44 hun 45 por 46 ell_PHONETIC 47 ell 48 ron 49 hye 50 hye_PHONETIC 51 lit 52 fil 53 vie

michal-fre commented 8 years ago

single language: 3-letters multiple languages with dash as delimiter: swe-nor-dan phonetic= language(3letters) with underline: ara_PHONETIC if language is unknown: leave empty