w3c / mlreq

Mongolian Layout Requirements
https://www.w3.org/International/mlreq/
23 stars 12 forks source link

Underline and NNBSP #9

Open r12a opened 7 years ago

r12a commented 7 years ago

The proposed CSS property text-decoration-skip with the value spaces causes underlines to skip white space. As currently defined, white space includes the NNBSP. This means that using this property would result in the following:

screen shot 2017-04-07 at 04 37 20 copy

Is that correct, or should the underline be unaffected by the NNBSP? That would result in:

screen shot 2017-04-07 at 04 37 20

(Source text: ᠣᠯᠠᠨ ᠦᠨᠳᠦᠰᠦᠲᠡᠨ ᠦ ᠪᠣᠯᠭᠠᠬᠤ ᠦᠢᠯᠡ ᠠᠵᠢᠯᠯᠠᠭ᠎ᠠ)

siqinbilige commented 7 years ago

There are 3 patterns exists in real word. temp

r12a commented 7 years ago

Thanks @siqinbilige. As CSS is currently specified, you'll only be able to get ① and ②. Is that ok, or do you think we need to push for all three options to be available?

If we had to choose just two options, would options ① and ③ be best, or options ① and ②?

siqinbilige commented 7 years ago

① and ② is OK. may be ③ is special case of ①. By the way, there are exists many line styles in mongolian. mongolianunderline

r12a commented 7 years ago

By the way, there are exists many line styles in mongolian.

I continued the discussion about text decoration styles in a separate issue.

r12a commented 7 years ago

[from Badral Sanlig]

It should be without a gap. The NNBSP causes also that problem for spellchecker. The wavy red line is drawn as with gap.

badaa commented 5 years ago

We would strongly recommend NNBSP (better Mongolian Suffix Connector) should be handled as part of word. If we had to choose just two options, would options ③ and ① be best.

r12a commented 5 years ago

Thanks @badaa. Then i think the next steps are to document the issue in the mongolian gap-analysis document, and then contact the CSS WG to request that the spec be changed.

r12a commented 5 years ago

Created a CSS issue (see just above) and a pull request for the mongolian layout docs at https://github.com/w3c/mlreq/pull/23

lianghai commented 5 years ago

It’s unwise to bring a special case into the text-decoration-skip property’s consideration only because certain (although might be mainstream) grammatical understanding about the concept of Mongolian “word”.

r12a commented 5 years ago

@lianghai you'll need to explain why.

lianghai commented 5 years ago

@r12a

Hudum enclitics (separated suffixes) are a bunch of morphological words (words separated with spaces) that are prosodically part of a larger, prosodic word that contains an ordinary, host word with zero or more suffixes and zero or more separated suffixes. It’s a well received theory that enclitics are grammatically comparable to suffixes thus the concept of “word” should equal to the whole prosodic word.

However, since when CSS started to provide special treatment on grammatical and/or prosodic structures?

The only reason seems to be that NNBSP is something that was encoded as a miserably mixed syntactical sugar of a space and a grammatical signal and a format control requesting special shaping effects on the following word. This is a historical mistake that Unicode has to live with, not something CSS has to try so hard to play with so it can prove itself is internationalized.

badaa commented 5 years ago

@lianghai Please don't use your own minded term "enclitics" everywhere. The term "en/clitic" (mostly for indic, latin languages) doesn't exist for Mongolian. Separated suffixes is far better term. Suffixes don't have independent/own meaning and don't live without stem words. It has only morphological meaning. I agree with that this character has been problematically defined in Unicode. Thus, it has to be considered everywhere, if we use NNBSP for Mongolian writing system.

r12a commented 5 years ago

hi @lianghai. The problem we're trying to solve here is that any gap between a Mongolian word root and suffix (or subsequent suffixes) should be underlined when text-decoration-skip is set to spaces. CSS has a number of language-specific rules already.

It turns out to be handy that NNBSP is typically used here, so we could add a Mongolian-specific rule to CSS. In fact, the same may apply to French (see the CSS issue).

What alternative approach can you suggest?

lianghai commented 5 years ago

@r12a

What alternative approach can you suggest?

The sane approach is to treat NNBSP like just any other spaces. It’s a space, abused for marking grammatical information and triggering mandatory shaping. And NNBSP is not gonna be used grammatically consistently by users because users simply don’t care about grammatical correctness when inputting text. Users will only be confused by the inevitably messed up text when they happen to use text-decoration-skip: spaces and this special treatment for NNBSP is going to be just another seemingly nice smartness of technology that is harmful to users—like how this whole Mongolian phonetic encoding model is.

@badaa

In my terminology, the key difference between the term “enclitic” and “separated suffix” is that whether an author recognizes the entity is first a morphological word (which is then prosodically dependent) or is first a suffix (which is then written separately). As Unicode is a text encoding technology, I consider morphological features as the foundation of analysis. It would be very subjective if one considered a structure that is never written like a suffix as a suffix and build his analysis on that.

You need to understand and accept that terminologies and linguistic analyses vary from person to person. The key is not whether you like a term or not (and your reasoning for not liking the term “enclitic” is naïve, mostly just because you think the Mongolian language is definitively something—no, there’s no conclusions in linguistics; people just use various tools to make various analyses), but whether an analysis is logically coherent. Coherent analyses can usually converge to some extent, while incoherent analyses can’t—although they might superficially use the same set of terms.

r12a commented 5 years ago

@lianghai your reply just above doesn't actually answer my question, which was: how to ensure that CSS doesn't underline the gap between word root and suffix when text-decoration-skip is set to spaces?

Richard57 commented 5 years ago

NNBSP is different from other spaces, which is why it exists. The CSS issue (https://github.com/w3c/csswg-drafts/issues/3393), as mentioned above, already notes that breaking underlining just for NNBSP seems wrong for French. If word-separating spaces are not to be underlined, when would breaking underlining at a NNBSP used properly be appropriate?

U+2007 FIGURE SPACE also seems different form other spaces - and it is yet another type of non-breaking space. How should it interact with underlining? This issue feels of lesser importance, as the character is primarily for numbers in columns rather than numbers in text.

mongoltolbo commented 5 years ago

"... This improves legibility of decorated text and corrects punctuation grammar for some languages.", According to css-tricks.com.

A: Obviously, It is a great idea, in my opinion. " ... in some languages" means directed to Mongol HUDAM, but it is said not only to Mongolian. B: "Now" I can use this clever property by default in my web project, aan. A: By the way, further, this idea should be implemented in word processing applications, right. B: Whaaaatever, the web and cloud application era is coming, noooo problem. A: YES, No breaking suffixes - No broken Layouts.

mongoltolbo commented 5 years ago

It’s unwise to bring a special case into the text-decoration-skip property’s consideration only because certain (although might be mainstream) grammatical understanding about the concept of Mongolian “word”.

@lianghai I appreciate that you think always different way, but, in science, you have to prove your idea. If you can't prove it just keep calm and go find your answer. Plus, don't play with ambiguous words, use plain English. [REDACTED]

lianghai commented 5 years ago

@r12a

@lianghai your reply just above doesn't actually answer my question, which was: how to ensure that CSS doesn't underline the gap between word root and suffix when text-decoration-skip is set to spaces?

My point is always—CSS simply shouldn’t “doesn't underline the gap between word root and suffix when text-decoration-skip is set to spaces”. The whole rationale that it should is based on NNBSP’s abused behavior, which shouldn’t be recognized anymore.

[REDACTED]

lianghai commented 5 years ago

@r12a

Actually, I just realized that, instead of asking me to examine “why not”, we need to go back and examine the original rationale:

It turns out to be handy that NNBSP is typically used here, so we could add a Mongolian-specific rule to CSS.

It seems, the original rationale for treating NNBSP as a special case largely comes from how the existing Mongolian encoding model utilizes this character and thus it seems nice to provide a script-specific feature in CSS to better serve the user community’s need.

However, it’s already well understood that the existing Mongolian encoding model is a highly messed up one, and it is undergoing significant alternations before it’s stabilized eventually (eg, see the item SI1 in the MWG #3 report for how the community is trying to move away from using NNBSP: https://www.unicode.org/L2/L2019/19139-mwg3-18-meeting-rept-r2.pdf). It’s simply not a good idea for a “higher-level protocol” to invent special behavior based on a Unicode encoding mode that is known to be highly problematic and is clearly unstable and is not widely implemented yet.

At this point I don’t think we even need to argue if treating NNBSP as a special case in CSS would be problematic or not.

Also, note that comment from Badral—

It should be without a gap. The NNBSP causes also that problem for spellchecker. The wavy red line is drawn as with gap.

—largely came from that he was trying to do spellchecking properly in Microsoft Word but he didn’t use the proper API initially (the API he was initially using only allows single-word spellchecking, while only later he realized there is a multi-word contextual spellchecking API provided by Word) and had to rely on NNBSP not being treated as a word boundary in Word.