Open OskarGrosser opened 3 years ago
I'm still working through this in my mind, but i'm beginning to think that it would be better to use the exisiting wbr
markup, rather than create more, but make it stylable using CSS. (So this would become a CSS issue, rather than an HTML one.)
wbr
is not specified in great detail, and it only appears to be associated with ZWSP because it currently doesn't produce a hyphen when the line breaks. I haven't discovered anything yet that prevents its use within words, as well as between them. And it's sole function appears to be to indicate a line break opportunity.
One problem with using ­
is that it's limited in use, unless the browser adds some smarts. It doesn't currently cope well with the following use cases:
My expectation is that when a browser gets around to providing support for hyphenation for a given language, and when the content author sets hyphens:auto
in CSS, the browser would automatically apply rules for hyphenation related to the language of the text, such that all of the above variations are addressed.
It seems to me, then, that the browser could do the same when a wbr
tag appears inside a word. The key starting point, in all cases, is to know where the break point opportunity lies (which is what the wbr
tag does), and what the language is.
So as not to break legacy content, it would probably be necessary to continue to produce no hyphenation behaviour by default for wbr
, other than a line-break. However, CSS could be used to activate the browser smarts so that the full hyphenation behaviour is produced automatically by the browser.
We could also go further and give authors CSS properties that would allow them to style the result of using wbr
, at least to some extent, in the absence of browser smarts. For example, we could allow authors to indicate what character should be used for the mark (or that no mark should be used) by styling the wbr
tag. This may help where hyphenation is not yet implemented for a browser+OS+language combination. For example, it would allow someone authoring Plains Cree to specify that the mark to use is ᐀ [U+1400 CANADIAN SYLLABICS HYPHEN], and thereby allow some degree of manual hyphenation to occur for Cree, well before the browser gets around to implementing the necessary dictionaries or rules required by hyphens:auto
.
It would also allow authors to do the equivalent of ­
for a language like Telugu, which browsers don't currently hyphenate, but which has complicated morphology and long words, and needs hyphenation (using the typical '-'). In this case, the advantage of using wbr
instead of ­
being that, as you originally wanted, the break points wouldn't be copied with the text.
cc @fantasai @frivoal
I don't think <wbr>
could serve that purpose as is without breaking the purpose it currently serves, as there's no way to tell the difference between <wbr>
in the middle of a word and <wbr>
separating two words. However, I suspect we could make it gain this new ability via an attribute.
If the attribute is absent, <wbr>
behaves like a zero width space, as it does today. If the attribute is present, <wbr>
would behave as a soft hyphen, and the value of the attribute would let you know what character to inject when line breaking, so that do the right thing in languages that the browser doesn't know how to handle. Let's call the attribute hyphen
:
<wbr>
would be the same as today<wbr hyphen>
would be the same as ­
, except that it would not be copy&pastable. The inserted character would typically be HYPHEN (U+2010), unless the content language is known and the browser knows better for that particular language (e.g. ᭠ [U+1B60 BALINESE PAMENENG] for Balinese)<wbr hyphen="᭠">
would be a hyphenation opportunity like ­
, also non copy&pastable, but it would explicitely provide the appropriate character, without any guess-word left to the UA.I wouldn't want to have to add <wbr hyphen="᭠">
to every word that needs a soft hyphen. Apart from the bulk, which would affect readability of the source code, if you wanted to use a different hyphen character you'd have to edit the HTML source for all the documents where this was used. Specifying the expected appearance using a line of CSS would provide much more flexibility, eg. for translations, where a different hyphen would be needed, and the appropriate change can be effected by altering a single line of CSS code.
Currently there is
­
to hint to a possible line-break opportunity. But as­
is its own character, copying the underlying text will copy that character as well.When
­
is inside an element, it will not display a hyphen, even if it is at a line-break."Inside an element" includes:
display: contents
)This rules out the possibility of styling it to behave only as a line-break opportunity, without copying it.
Changing the behavior of
­
is out of question, as it is already used for long and for specific purposes. Also, its definition is debated, but HTML and Unicode seem to have settled on one. Hence there is still a need for hinting to line-break opportunities without the hint actually encoding a character. Without the ability of styling­
to mimic said behavior, another solution is required.There exists a similar case, where
U+200B ZERO-WIDTH SPACE
can be represented by<wbr>
or​
, with the first being unselectable, and the latter as the character itself.I suggest - as in the zero-width space case - to have a complementary HTML-element for
­
. It should:hyphens: manual
Basically be
­
but unselectable, as in not part of the rendered text, like<wbr>
.