Open ronaldtse opened 4 years ago
Standard Lao orthography is straightforward (comparing to nearby Thai) and can be transliterated with current implementation.
However, users of Lao often use non-standard / pre-reform orthography, and examples in the BGN/PCGN document clearly indicates that these non-standard forms need to be handled. It is still doable, but will be quite difficult to implement with only regex replace and character mapping.
Here is a paper on how to handle Lao word boundaries using a rule-based strategy for future reference. http://www.panl10n.net/english/final%20reports/pdf%20files/Laos/LAO06.pdf
And here is a romanization implementation in Perl: https://github.com/jokke/Lingua-LO-Romanize/blob/master/lib/Lingua/LO/Romanize/Syllable.pm
@chaaklau is it possible to express the rules of https://github.com/jokke/Lingua-LO-Romanize/blob/master/lib/Lingua/LO/Romanize/Syllable.pm in the new transliteration language?
I think we need the following elements in the language:
[໐-໙]
syllables = (ກຂຄງຈສຊຍຽດຕຖທນບປຜຝພຟມຢຣລຼວຫອຮໜໝ)
if
, else
, case
ດ → d
, ຯ → ...
, ^ວ || ^ົວ → oua
Let me move this discussion to the language thread...
This issue is to implement the transliteration system of bgnpcgn-lao-laoo-latn-1966.
This system is referred in the GeoNames database as
lao_Laoo2Latn_CNT_1966
, with the system title 'BGN/PCGN Romanization Agreement -- Lao (1966)'.Tests should rely on the data extracted for the
lao_Laoo2Latn_CNT_1966
system in https://github.com/riboseinc/geonames-transliteration-data .