latex3 / lua-uni-algos

Unicode algorithms for use by LuaLaTeX packages
2 stars 4 forks source link

bug in Hangul composition #1

Closed dohyunkim closed 4 years ago

dohyunkim commented 4 years ago

https://github.com/zauguin/lua-uni-algos/blob/70d8873e28a15b60096baae459c864053b9843de/lua-uni-normalize.lua#L236

It seems that 0x11A7 should be changed to 0x1175. Otherwise, for instance the following gives quite unexpected output:

\directlua{
  local normalize=require"lua-uni-normalize"
  local str="^^^^1100^^^^119e^^^^11ab"
  print("", str, normalize.NFC(str))
}
\end
This is LuaTeX, Version 1.12.0 (TeX Live 2020)
 restricted system commands enabled.
(./t.tex    ᄀᆞᆫ 늰
)

which should be

(./t.tex    ᄀᆞᆫ ᄀᆞᆫ
)
zauguin commented 4 years ago

Thank you, it's fixed now.

zauguin commented 4 years ago

Do you have some background on these? According to Unicode classifications, U+1176 to U+115F (and U+D7B0..D7C6 also have type "Vowel_Jamo", so what makes them different? Are they not used in the LV/LVT syllable structure or are their syllables just not encoded?

dohyunkim commented 4 years ago

A precompsed LV/LVT syllable (U+AC00..U+D7A3) is canonically composed of a Leading Consonant (U+1100..U+1112) and a Medial Vowel (U+1161..U+1175) and possibly a Trailing Consonant (U+11A8..U+11C2). These LV/LVT syllables are used for modern Hangul writing, whereas all other Hangul Jamo characters (not listed above) are used for expressing medieval Hangul. Unlike modern Hangul syllables, Unicode does not have encoded precomposed medieval Hangul syllables. So we need OpenType layout features such as ljmo vjmo and tjmo to typeset medieval Hangul text.