Open r12a opened 4 years ago
The first comment in this issue contains text that will automatically appear in the Khmer gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.
"ICU use word boundaries to break but it looks not nice, because it depend on the people who provide wordlist, for example the name of USA (United State of America) in Khmer it is សហរដ្ឋអាមេរិក ICU consider as one word, when it break to new line, it remain the long blank in old line. Normally, we can break it to 2 word សហរដ្ឋ = United State and អាមេរិក." (Hong)
"There is a change going through ICU at the moment, to how Khmer is line broken. The basis of line breaking is still dictionary based and word broken. There is no intent to support syllable breaking. The following changes are made in that change:
An issue with the use of dictionary lookup is that browsers don't have dictionary lookup support for minority languages that use the Khmer script. And in fact, regardless of the declared language of the text, browsers tend to apply the Khmer dictionary to text written in the Khmer characters.
For such languages, it would be helpful if the content author could either:
Marking this as advanced for now for the Cambodian language, but open to arguments that the difficulties produced are worth a status of basic.
For minority languages, the status is clearly going to be broken, since there's no way to override the use of the Khmer dictionary.