Implement system `bgnpcgn-lao-laoo-latn-1966` (BGN/PCGN Romanization Agreement -- Lao (1966)

ronaldtse commented 4 years ago

This issue is to implement the transliteration system of bgnpcgn-lao-laoo-latn-1966.

This system is referred in the GeoNames database as lao_Laoo2Latn_CNT_1966, with the system title 'BGN/PCGN Romanization Agreement -- Lao (1966)'.

Tests should rely on the data extracted for the lao_Laoo2Latn_CNT_1966 system in https://github.com/riboseinc/geonames-transliteration-data .

chaaklau commented 4 years ago

Standard Lao orthography is straightforward (comparing to nearby Thai) and can be transliterated with current implementation.

However, users of Lao often use non-standard / pre-reform orthography, and examples in the BGN/PCGN document clearly indicates that these non-standard forms need to be handled. It is still doable, but will be quite difficult to implement with only regex replace and character mapping.

Here is a paper on how to handle Lao word boundaries using a rule-based strategy for future reference. http://www.panl10n.net/english/final%20reports/pdf%20files/Laos/LAO06.pdf

And here is a romanization implementation in Perl: https://github.com/jokke/Lingua-LO-Romanize/blob/master/lib/Lingua/LO/Romanize/Syllable.pm

ronaldtse commented 4 years ago

@chaaklau is it possible to express the rules of https://github.com/jokke/Lingua-LO-Romanize/blob/master/lib/Lingua/LO/Romanize/Syllable.pm in the new transliteration language?

I think we need the following elements in the language:

sets: [໐-໙]
variables: syllables = (ກຂຄງຈສຊຍຽດຕຖທນບປຜຝພຟມຢຣລຼວຫອຮໜໝ)
character set routines: transformation from input set to output set
conditions: if, else, case
matchers: starts with, ends with, lookahead, lookbehind, extraction, in place replacement... some subset of regular expressions (regexes are not generally portable across systems/languages)
maps: ດ → d, ຯ → ..., ^ວ || ^ົວ → oua
methods to capture parts of speech
dictionary needs

Let me move this discussion to the language thread...

interscript / maps

Implement system `bgnpcgn-lao-laoo-latn-1966` (BGN/PCGN Romanization Agreement -- Lao (1966) #131