w3c / mlreq

Mongolian Layout Requirements
https://www.w3.org/International/mlreq/
25 stars 12 forks source link

Browsers don't hyphenate Mongolian text #40

Open r12a opened 3 years ago

r12a commented 3 years ago

Hyphenation occurs in writing Mongolian and Todo. U+1806 MONGOLIAN TODO SOFT HYPHEN is used to indicate resumption of a broken word. It functions like U+2010 HYPHEN, except that it appears at the beginning of a line rather than at the end. (Note that lines of Mongolian text are vertical, and progress from left to right.)

Specs: issue Better describe the likely outcomes of hyphenation Open.

css-text Describes how to apply hyphenation. It makes no special mention of Mongolian, nor of which character to use and where.

css-text Has a hyphenate-character property which will allow users to specify the character to use for hyphenation, but it doesn't allow control of the location of the character.

Tests & results:

Webkit is unable to display traditional Mongolian script.

Interactive test, Mongolian text is hyphenated when hyphens:auto is set

Interactive test, Mongolian adds a hyphen to the start of the second line when a word is manually hyphenated with SHY

i18n test suite, CSS3 Text, hyphens
General tests for hyphens support. (Results may need updating.)

Browser bug reports: GeckoBlinkWebkit

Priority: Marked as advanced, since hyphenation is optional.

r12a commented 3 years ago

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include: _Mongolian_

r12a commented 1 year ago

I think this gap report needs to be completely rewritten. My expectation is that Mongolian words should not be split across a line, or need to use the SOFT HYPHEN. However, when compound nouns are separated by TODO SOFT HYPHEN, hyphen should move to the next line (and it has the right line-break property for that). Currently seeking clarification at https://github.com/w3c/mlreq/issues/30

asmusf commented 1 year ago

My question: I keep hearing that there still are some open issues about the encoding model for (some aspects of) the (traditional) Mongolian script. If that assessment is correct, is it worth spending cycles on this issue? Or have those overarching issues been put to rest in the meantime?

r12a commented 1 year ago

I'm assuming that there is no connection between the handling of hyphens in Hudum and the encoding model changes (which focus on the letters).

asmusf commented 1 year ago

OK, but pending changes would make it difficult to have any implementation that can treat the text "correctly" in its entirety. At least not until they are settled. Just sayin'