interscript / maps

Script conversion maps for Interscript
2 stars 1 forks source link

Implement system `royin-tha-thai-latn-1968` (Royal Thai General System of Transcription (1968) #121

Open ronaldtse opened 4 years ago

ronaldtse commented 4 years ago

This issue is to implement the transliteration system of royin-tha-thai-latn-1968.

This system is referred in the GeoNames database as tha_Thai2Latn_RIT_1968, with the system title 'Royal Thai General System of Transcription (1968)'.

Tests should rely on the data extracted for the tha_Thai2Latn_RIT_1968 system in https://github.com/riboseinc/geonames-transliteration-data .

ronaldtse commented 4 years ago

Spec: http://www.royin.go.th/wp-content/uploads/royin-ebook/276/FileUpload/758_6484.pdf

chaaklau commented 4 years ago

This has been partially implemented by interscript/interscript-ruby#262 Also see comments below.


royin-tha-Thai-Latn-1999 has been partially implemented by interscript/interscript-ruby#262

This map (as well as all other royin maps) is implemented via two extra intermediate steps. (The mapping file of royin-tha-Thai-Latn-1999 only contains rules for Step 3.)

  1. Thai is first converted into syllable-segmented phonemic Thai;
  2. Phonemic Thai is converted into IPA;
  3. IPA is converted into Latn, according to the specification.

The latter two conversion steps can be implemented with rule-based transformation, and accurate conversion should be possible.

The first conversion step (Thai to Phonemic Thai) is a known difficult problem for Thai, and this is independent of transcription systems. Further testing using geonames-transliteration-data can be done after improvement work for this step.

Originally posted by @chaaklau in https://github.com/interscript/interscript/issues/168#issuecomment-609061617