interscript / maps

Script conversion maps for Interscript
2 stars 1 forks source link

Implement system `bgnpcgn-mya-mymr-latn-1970` (BGN/PCGN Romanization Agreement -- Burmese (1970) #126

Open ronaldtse opened 4 years ago

ronaldtse commented 4 years ago

This issue is to implement the transliteration system of bgnpcgn-mya-mymr-latn-1970.

This system is referred in the GeoNames database as nep_Mymr2Latn_BGN_1970, with the system title 'BGN/PCGN Romanization Agreement -- Burmese (1970)'.

Tests should rely on the data extracted for the nep_Mymr2Latn_BGN_1970 system in https://github.com/riboseinc/geonames-transliteration-data .

Spec: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693688/ROMANIZATION_OF_BURMESE.pdf

ronaldtse commented 3 years ago

Description: "This system is an amplified restatement of the 1907 version of the Tables for the Transliteration of Burmese into English, published in 1908 by the Office of the Superintendent, Government Printing, Rangoon, Burma."

Notes:

NOTES

  1. The symbol ◌ in the tables and in the following notes represents any Burmese consonant character, and the letter C represents the romanized equivalent of that character. The symbol → means “is romanized”.
  2. Except when accompanied by a dependent vowel character or an end-of-syllable mark, a Burmese consonant character or a consonant character combination should be romanized with a following vowel letter a: မဒမ → madama, အက → aga, ကလိ → kali, သာငယ္ → thangè, ပစင္ → pyazin. 3
  3. At the beginning of a word, the vowel-carrier အ should not be romanized, unless followed by a consonant character that does not carry a vowel character or an end-of- syllable mark, in which case the character အ should be romanized a: အကာ → aga, but အိုဘဲ့ → obè, အပ္ → at. At the beginning of a medial or final syllable, အ should be rendered by a hyphen: မအူ → ma-u, သီးပင္အိုင္ → Thibin-aing.
  4. The independent vowel characters should be romanized without a hyphen at the beginning of words and with a hyphen at the beginning of medial and final syllables: ဩဘာ → awba, ဧဏီ → eni, ေကဧ → kye-e, ေက၁င္ဥကဥ္ → kyaung-ugyin.
  5. When two consonant characters are written stacked one above the other without an end-of-syllable mark, the upper character should be romanized first, followed by the lower character, and then the vowel and consonant characters, if any: သဒ → thadda, အိမဘဝ → andimabawa. It should be noted that the alternative romanizations shown in the tables of consonant characters and consonant character combinations do not apply to the upper character: ဥကဌ → ukkada.
  6. When the letter n at the end of a syllable within a romanized word is followed by g or y at the beginning of the next syllable, the letter sequences should be rendered n-g and n-y, respectively, in order to differentiate those sequences from the digraphs ng and ny: အင္းကတ္ → in-gut, ကန္ရက္ → kun-yet, but ေရငန္း → shwengan, ညိညာ → nyinya, တုိင္ေအာင္ → taing-aung. Similarly, the letter sequence consisting of t at the end of a syllable within a romanized word, followed by h at the beginning of the next syllable, should be rendered t-h in order to differentiate that sequence from the digraph th: ဟက္ဟက္ပက္ပက္ရယ္ → het-hetpetpetyè, but ဝသီ → wathi.
  7. The tone marks ◌့ and ◌း are not represented in romanization: ေဘးမဲ့ → bemè, တံ့စား → tanza, ပီးစီး → pyizi.
  8. The vowel mark ◌ indicates a change in the romanization of the preceding syllable from a to in: သေဘာ → thinbaw, ဘဂလားေအာ္ → Bin-gala Aw, စကာပူ → Sin-gabu.
  9. Although of infrequent occurrence, a number of character ligatures and abbreviations are found in Burmese writing. In the event that a character not shown in the tables is encountered, a reference source should be consulted.
  10. ◌ည္ is romanized i, in or e, depending on pronunciation. A reference source should be consulted in case of uncertainty.
  11. The Romanization columns show only lowercase forms but, when romanizing, uppercase and lowercase Roman letters as appropriate should be used.