Closed r12a closed 1 year ago
The CSS Working Group just discussed don't provide a lang param for word boundary
.
(Question: would it be reasonable to enforce user agents who do not support a language to insert word boundaries chars everywhere a break is allowed normally, to use as a fallback?)
We don't think the content author is able to guess what languages are supported by the user agent
Agreed. But to me, that supports that current design. Here's an example:
If you want to do "word" based line breaking for titles in Japanese, instead of the typical between-every-letter line breaking, assuming you have <wbr>
s (or U+200B) in your markup, you can do this:
h1:lang(ja) {
word-break: keep-all;
}
If you don't have <wbr>
s (or U+200B) in your markup, and want to auto detect the word boundaries, you wouldn't want to merely do this:
h1:lang(ja) {
word-boundary-detection: auto(ja);
word-break: keep-all;
}
Because if the UA doesn't know how to do boundary detection in Japanese, the text will overflow, due to a lack of wrapping opportunities. So instead, what you'd do is something like that:
@supports( word-boundary-detection: auto(ja) ) {
h1:lang(ja) {
word-boundary-detection: auto(ja);
word-break: keep-all;
}
}
But if we change the spec not to supply a language parameter to the word-boundary-detection property, you can no longer do that.
(Question: would it be reasonable to enforce user agents who do not support a language to insert word boundaries chars everywhere a break is allowed normally, to use as a fallback?)
If word-boundary-detection was exclusively for line breaking, I suppose that could work, but if you're using it with word-boundary-expansion, then that's the wrong fall-back. In that case, if word-boundary-detection doesn't work for the target language, you'd want to do no expansion, rather than expansion between (almost) every letter.
I fully, 100%, absolutely agree with the OP here. In fact, before seeing this issue, I just sent an email to a colleague suggesting this, and describing why the current behavior doesn’t make any sense.
From my email:
The worst thing that can happen if the browser doesn’t have a dictionary for a particular language is that it falls back to the default behavior of boundary analysis (as-if word-boundary-detection wasn’t specified at all). …
I don’t know of any other text layout system which has this language based range behavior. In every other publishing system I’ve seen, this dictionary-based approach is either (a) automatically enabled and always on, or (b) an opt-in with a single boolean switch.
I honestly don’t understand the backwards-compatibility story described above in this thread. If a publishing house cares very much about exactly where their line breaks are, they won’t use this property, because different browsers and OSes will implement it differently. Therefore, it’s totally OK if this property has progressive enhancement; the author doesn’t know where the line breaks will be anyway - they are just telling the browser “do your best to improve the quality, possibly at the expense of performance.”
+1 to @litherum, doing this in the CSS value syntax doesn't look right to me too.
Blink is planning to implement this feature (Japanese natural line breaking) in Q3. Great if the WG can reconsider the syntax before that, but otherwise, we'll go with the current syntax.
We are implementing this now also, and would ideally like a resolution soonish.
I honestly don’t understand the backwards-compatibility story described above in this thread. If a publishing house cares very much about exactly where their line breaks are, they won’t use this property, because different browsers and OSes will implement it differently. Therefore, it’s totally OK if this property has progressive enhancement; the author doesn’t know where the line breaks will be anyway - they are just telling the browser “do your best to improve the quality, possibly at the expense of performance.”
I think this is a sign that this should not be on a dedicated property, and that it does belong as a special value of word-break
.
Indeed, as a special value of word-break
, as you said, if the browser doesn't know how to do the detection for the given language, it falls back to normal line breaking, and that's fine.
If it is a value on a separate property with the behavior proposed for word-boundary-detection
there is a problem to be solved. I now think it is the wrong solution, but there was a problem: word-boundary-detection
doesn't make words stay together on their own, it merely introduces <wbr>
equivalents where the boundaries belong, and counts on the author separately turning on word-break: keep-all
to get the correct line breaking. But if keep-all
gets turned on on a piece of text where the browser doesn't know how to detect the boundaries, then instead of falling back to normal line breaking, you fallback to just keep-all
, which means mostly no breaking at all.
There was a reason for having it as a separate property (the injected boundaries can be used for other purposes), but still, the awkwardness of this language parameter show that this is likely the wrong design. refactoring this into a new word-break
value (and rearanging word-boundary-expansion
to work differently) will work better.
Oh, I forgot to say earlier: we are hooking this up to CFStringTokenizer, which doesn’t have API (or SPI) that exposes the languages that have supported dictionaries. Nor should it (for the reasons described earlier in this thread). So this auto() function isn’t really implementable for us as-is.
this should not be on a dedicated property
ICU seems to have put this as one of the alternatives of word-break: https://unicode.org/reports/tr35/#:~:text=%22lw%22-,Line%20break%20word%20handling,-%22normal%22 maybe we should too? I don’t really have an opinion other than we should figure this out soon, because there are (at least) 2 active implementations.
+1 to make this a new value of word-break
.
https://github.com/w3c/csswg-drafts/pull/8974 is a potential draft spec change that we could make here.
The CSS Working Group just discussed language parameters for word-boundary-detection
, and agreed to the following:
RESOLVED: remove auto () from word-boundary-detection, add keyword to word-break for this functionality
Agenda+ to resolve on the exact name for the word-break
value. Three options to emoji-vote on:
The reason to consider variants of auto
in the name is that phrase-base line breaking may not be available in the given language or platform.
auto
😄
phrase
🎉
auto-phrase
❤️
The CSS Working Group just discussed [css-text-4] Don't provide a language parameter for word-boundary-detection
, and agreed to the following:
RESOLVED: name it auto-phrase
RESOLVED: add a principle saying you should break within the phrase instead of overflowing
RESOLVED: If your phrase is too long you should break at a normal word boundry rather than overflow
2.2.1. Detecting Word Boundaries: the word-boundary-detection property
Fantasai provided some additional explanation about this feature, which explains why a range of language tags would be used:
The i18n WG thinks that the choice of whether or not to apply the word boundary detection algorithm should be set by applying the word-boundary-detection styling to the relevant content. The language information used should be that provided by the
lang
attribute, and not supplied as a parameter with this property value.We don't think the content author is able to guess what languages are supported by the user agent, so it doesn't seem useful to make them specify language in the property value. We think that the approach currently described in the spec also requires the content author to have a level of understanding about language tagging that is too high (see the examples about Cantonese).
We think that there should also be a simple recommendation that user agents SHOULD NOT apply a boundary detection algorithm to text in a language for which the algorithm is not defined (modulo decisions wrt dialect support).