interscript / maps

Script conversion maps for Interscript
2 stars 1 forks source link

Double check BGN 1962 Korean system #91

Open ronaldtse opened 4 years ago

ronaldtse commented 4 years ago

Original spec: BGN_Romanization_Guide_1962_korean.pdf

@chaaklau could you please help check this too? Thanks!

chaaklau commented 4 years ago

This was implemented as bgn-kor-Hang-Latn-1939, based on "BGN/PCGN 1945 Agreement".

The only difference between the two documents that I could spot, is on how they handle the initial (rieut / r)

This letter is used to write out Sino-Korean words which was historically pronounced /r/. This initial had been dropped in the spoken language before it was removed from the orthography in 1933 [1]. Before this reform, this letter R used to be pronounced as /n/ or dropped in the word-initial position. After the reform, the written form would be consistent with the actual pronunciation. Any word-initial R, for decades after 1933, would have been either loanwords or old spelling.

In the 1962 guide, this sound is transcribed as n or dropped, following the pre-reform convention. I suppose this is a way to handle pre-reform written work, i.e. if one sees this word-initial R, then follow the pre-reform phonological rule for transliteration.

In the latest guide, this rule has been removed, and R is always transliterated as is. This is because the traditional spelling was restored in DPRK in 1966, and since then all is supposed to be pronounced and transliterated as r under this DPRK standard.

A related note from the latest BGN/PCGN guide is quoted below:

As a result of 조선말규범집 (‘Standard Korean Language’ guidelines published in North Korea in 1966), unlike the Korean spoken in the Republic of Korea, the language spoken in the Democratic People’s Republic of Korea maintains and pronounces the word-initial ᆯ (‘r’). The use of the word- initial ᄅ ('r') can be seen in official news reports as well as native mapping. Since such examples exist, the word initial ᄅ ('r') is reflected as an option in the tables given above.

[1] Here is the original text from Wikisource

chaaklau commented 4 years ago

@ronaldtse I believe this 1962 guide is closer to the original bgn-kor-Hang-Latn-1939. I can change the content of the file to match with this 1962 guide.

The current implementation, which is based on BGN/PCGN document, can be renamed to bgnpcgn-kor-kore-latn-kn-1945.

I found this name in code.csv, but I am not sure if the Script code Kore should be used instead of Hang. Written Korean is still occasionally mixed with Hanja.

bgnpcgn-kor-kore-latn-kn-1945,BGN/PCGN Romanization Agreement -- Korean (North Korea) (1945),bgnpcgn,iso-639-2,kor,Kore,Latn,kn-1945,BGN/PCGN Romanization Agreement -- Korean (North Korea) (1945),,"Date approved: 1945; Originator: McCune-Reischauer, 1939"

ronaldtse commented 4 years ago

@chaaklau I found the source document of the McCune-Reischauer system from the Korea Branch of the Royal Asiatic Society. Volume XXIX (PDF), and in particular, the article:

I think we need to split into the following systems:

chaaklau commented 4 years ago

A list of words in Hanja with their pronunciation can be found here: Hanja Hangul Converter 0.0.7 (Public Domain)

Under the current implementation, all Hanja -> Hangul conversion rules can be added under map[rules]. This will give the correct result but O(n) runtime with a big dictionary is undesirable.

I am going to modify interscript.rb to add handle dictionary lookup using binary search. After this is done, the Hangul -> Jamo map can also be handled using dictionary.

chaaklau commented 4 years ago

Dictionary lookup and Hanja support have been implemented. cd238a1 Implement dictionary lookup f5bbd72 Restructure and add Hanja support to Korean Maps

The maps need some further review and updates. I will send a pull request when they are ready.