Hangul_Syllable_Type values correspond to Grapheme_Cluster_Break values where hst!=NA, except for some Kirat Rai vowel signs (new in Unicode 16). [:GCB=LV:] == [:hst=LV:] etc.
In ICU code (uprops.cpp):
/*
* Map some of the Grapheme Cluster Break values to Hangul Syllable Types.
* Hangul_Syllable_Type is fully redundant with a subset of Grapheme_Cluster_Break.
*
* Starting with Unicode 16, this is not quite true:
* Some Kirat Rai vowels are given GCB=V for proper grapheme clustering, but
* they are of course not related to Hangul syllables.
*/
static const UHangulSyllableType gcbToHst[]={
U_HST_NOT_APPLICABLE, /* U_GCB_OTHER */
U_HST_NOT_APPLICABLE, /* U_GCB_CONTROL */
U_HST_NOT_APPLICABLE, /* U_GCB_CR */
U_HST_NOT_APPLICABLE, /* U_GCB_EXTEND */
U_HST_LEADING_JAMO, /* U_GCB_L */
U_HST_NOT_APPLICABLE, /* U_GCB_LF */
U_HST_LV_SYLLABLE, /* U_GCB_LV */
U_HST_LVT_SYLLABLE, /* U_GCB_LVT */
U_HST_TRAILING_JAMO, /* U_GCB_T */
U_HST_VOWEL_JAMO /* U_GCB_V */
/*
* Omit GCB values beyond what we need for hst.
* The code below checks for the array length.
*/
};
static int32_t getHangulSyllableType(const IntProperty &/*prop*/, UChar32 c, UProperty /*which*/) {
// Ignore supplementary code points: They all have HST=NA.
// This is a simple way to handle the GCB!=hst cases since Unicode 16 (Kirat Rai vowels).
if(c>0xffff) {
return U_HST_NOT_APPLICABLE;
}
/* see comments on gcbToHst[] above */
int32_t gcb=(int32_t)(u_getUnicodeProperties(c, 2)&UPROPS_GCB_MASK)>>UPROPS_GCB_SHIFT;
if(gcb<UPRV_LENGTHOF(gcbToHst)) {
return gcbToHst[gcb];
} else {
return U_HST_NOT_APPLICABLE;
}
}
Hangul_Syllable_Type values correspond to Grapheme_Cluster_Break values where hst!=NA, except for some Kirat Rai vowel signs (new in Unicode 16).
[:GCB=LV:] == [:hst=LV:]
etc.In ICU code (uprops.cpp):