interscript / maps

Script conversion maps for Interscript
2 stars 1 forks source link

Implement system `royin-tha-thai-latn-1939` (Royal Thai General System of Transcription (1939)) #89

Open ronaldtse opened 4 years ago

ronaldtse commented 4 years ago

This issue is to implement the transliteration system of royin-tha-thai-latn-1939.

There are two systems and different IDs should be assigned to them:

Spec: http://www.siamese-heritage.org/jsspdf/1941/JSS_033_1d_RoyalInstituteTranscriptionOfThaiIntoRomanCharacters.pdf

More details: http://www.royin.go.th/wp-content/uploads/royin-ebook/276/FileUpload/758_6484.pdf

chaaklau commented 4 years ago

royin-tha-Thai-Latn-1939-generic has been partially implemented by interscript/interscript-ruby#262 along with other royin maps. (Hyphenation has not yet been implemented.)

royin-tha-thai-latn-1939-precise marks all orthographical distinction and phonemic distinction in one system, which needs to be implemented separately.

Also see comments below.


royin-tha-Thai-Latn-1999 has been partially implemented by interscript/interscript-ruby#262

This map (as well as all other royin maps) is implemented via two extra intermediate steps. (The mapping file of royin-tha-Thai-Latn-1999 only contains rules for Step 3.)

  1. Thai is first converted into syllable-segmented phonemic Thai;
  2. Phonemic Thai is converted into IPA;
  3. IPA is converted into Latn, according to the specification.

The latter two conversion steps can be implemented with rule-based transformation, and accurate conversion should be possible.

The first conversion step (Thai to Phonemic Thai) is a known difficult problem for Thai, and this is independent of transcription systems. Further testing using geonames-transliteration-data can be done after improvement work for this step.

Originally posted by @chaaklau in https://github.com/interscript/interscript/issues/168#issuecomment-609061617