interscript / maps

Script conversion maps for Interscript
2 stars 1 forks source link

Implement system `bgnpcgn-lao-laoo-latn-1966` (BGN/PCGN Romanization Agreement -- Lao (1966) #131

Open ronaldtse opened 4 years ago

ronaldtse commented 4 years ago

This issue is to implement the transliteration system of bgnpcgn-lao-laoo-latn-1966.

This system is referred in the GeoNames database as lao_Laoo2Latn_CNT_1966, with the system title 'BGN/PCGN Romanization Agreement -- Lao (1966)'.

Tests should rely on the data extracted for the lao_Laoo2Latn_CNT_1966 system in https://github.com/riboseinc/geonames-transliteration-data .

chaaklau commented 4 years ago

Standard Lao orthography is straightforward (comparing to nearby Thai) and can be transliterated with current implementation.

However, users of Lao often use non-standard / pre-reform orthography, and examples in the BGN/PCGN document clearly indicates that these non-standard forms need to be handled. It is still doable, but will be quite difficult to implement with only regex replace and character mapping.

Here is a paper on how to handle Lao word boundaries using a rule-based strategy for future reference. http://www.panl10n.net/english/final%20reports/pdf%20files/Laos/LAO06.pdf

And here is a romanization implementation in Perl: https://github.com/jokke/Lingua-LO-Romanize/blob/master/lib/Lingua/LO/Romanize/Syllable.pm

ronaldtse commented 4 years ago

@chaaklau is it possible to express the rules of https://github.com/jokke/Lingua-LO-Romanize/blob/master/lib/Lingua/LO/Romanize/Syllable.pm in the new transliteration language?

I think we need the following elements in the language:

Let me move this discussion to the language thread...