Open ronaldtse opened 4 years ago
UNGEGN report for Thai Geonames has a detailed description of this system. The original paper can be found in The Journal of the Royal Institute of Thailand.
Here is a note from the UNGEGN report:
One must bear in mind that Romanization of Thai in this case employs a transcription method, nota transliteration method. Thus, a tone mark, a diacritic mark including a silencing mark, and vowel length are completely ignored. lt means that one who can transcribe Thai words must correctly know how to read them or to pronounce them.
There are a lot of irregular spellings (due to historical reasons), and accuracy for segmentation is low. I tried PyThaiNLP, which provides two engines for romanization, but the result is not satisfactory.
Like Japanese and Chinese, I believe some kind of preprocessing (segmentation, marking of silence letters, marking vowel insertions, etc.) is needed before the Thai can be transliterated using the current method.
interscript/interscript-ruby#235 only handles basic conversion rules. More rules will be added.
royin-tha-Thai-Latn-1999
has been partially implemented by interscript/interscript-ruby#262
This map (as well as all other royin
maps) is implemented via two extra intermediate steps. (The mapping file of royin-tha-Thai-Latn-1999
only contains rules for Step 3.)
The latter two conversion steps can be implemented with rule-based transformation, and accurate conversion should be possible.
The first conversion step (Thai to Phonemic Thai) is a known difficult problem for Thai, and this is independent of transcription systems.
Further testing using geonames-transliteration-data
can be done after improvement work for this step.
This issue is to implement the transliteration system of royin-tha-thai-latn-1999.
This system is referred in the GeoNames database as
tir_Thai2Latn_RIT_2000
, with the system title 'Royal Thai General System of Transcription (1999)'.Tests should rely on the data extracted for the
tir_Thai2Latn_RIT_2000
system in https://github.com/riboseinc/geonames-transliteration-data .