Closed r12a closed 1 year ago
Polish language is one I believe? I understand the OpenOffice hyphenation rules for Polish apply a hyphen to both the end of the first line and the start of the next.
As you're looking at this, we noticed that hyphenate-character
allows you to override the defaults, but it doesn't allow you to specify whether your overridden character is at the end of the first line or the start of the second (or both). Two easy ways to specify this would be to give hyphenate-character
an optional second argument, eg hyphenate-character: "" "-"
, or a single string with a newline to separate the two values, eg hyphenate-character: "-\A-"
. How necessary this is, I don't know.
Here's another example of the visual marker appearing at the beginning of a line. Unicode Standard, v11, p536:
In writing Mongolian and Todo, U+1806 mongolian todo soft hyphen is used at the beginning of the second line to indicate resumption of a broken word. It functions like U+2010 hyphen, except that U+1806 appears at the beginning of a line rather than at the end.
Do you have a suggestion for a short example that we could include, for instance right after the first paragraph of 5.4? If we do add something, it would be good to keep it short, just to illustrate the point that hyphenation can be different / more complicated than what is typical of English. But I wouldn't attempt to list too many cases. As much as I find that sort of things interesting, css-text cannot scale to describing all the peculiarities of all the world's languages :)
(Given generic language in the spec that generically allows and expects "the right thing" for all languages, automated tests in wpt might be a more effective place to highlight the specificity of various languages)
The line breaking / hyphenation of pinyin can be an example too, but it may be less common than the above examples (and may be more suitable for css-ruby?).
(Related clreq issue: https://github.com/w3c/clreq/issues/351)
@r12a @xfq We've added a short table of examples illustrating the spelling changes (which are normatively noted in the paragraph above) here: https://drafts.csswg.org/css-text-3/#hyphenation
If you have other examples you want to add, we can do that; but please remember we're not trying to make the spec examples exhaustive. :) It might be useful to compile your more exhaustive notes into the Typography index, though, and we can link there if you want.
We also clarified the spec to say that hyphenation character changes must, and spelling changes should, apply. (The SHOULD is because, if the spelling differs between hyphenated and unhyphenated forms, depending on where the author ended up inserting the UA might not be able to match up the author's chosen hyphenation opportunity against its hyphenation dictionary.)
We did not make any changes for WBR, see @frivoal's comments in https://github.com/w3c/csswg-drafts/issues/5972#issuecomment-826582035 and https://github.com/whatwg/html/issues/6326#issuecomment-826595860 . Note that if HTML does introduce a way to mark up explicit hyphenation opportunities in the future, the spec is written to be generic to such mechanisms already.
Agenda+ for CSSWG review.
The CSS Working Group just discussed Better describe the likely outcomes of hyphenation (editorial)
, and agreed to the following:
RESOLVED: Accept changes
@r12a @xfq We've added a short table of examples illustrating the spelling changes (which are normatively noted in the paragraph above) here: https://drafts.csswg.org/css-text-3/#hyphenation
[1] The Uighur example is missing the 'hyphen'. It should be a short baseline extension, separated from the last letter by a small space. Here's an example. It's not entirely clear how the line should be produced. Some say that the font should automatically drop and lengthen a normal hyphen, but others say you should use ـ U+0640 ARABIC TATWEEL. In the meantime, perhaps an SVG image would be better here.
[2] Although the introductory text mentions that other symbols may be used, rather than a hyphen, the list of examples doesn't back that up convincingly - it only shows hyphens. I can provide one extra example for you, but how would you like it? I can provide text, but others may not be able to see the text, or i could provide an SVG image which could be displayed at approximately normal text size.
@r12a The backing store is actually using U+0640 but it looks like it needs some kind of thin space to create the visual separation. What do you recommend here?
How about using these images:
If that works for you, i can provide another set to show another non-hyphen based hyphenation in a different script.
PS: If you like, you can also use those images plus the following 2 for Example 18, which looks a little ragged as a bitmap.
@r12a I've updated the spec to use the images you provided.
As for this:
Although the introductory text mentions that other symbols may be used, rather than a hyphen, the list of examples doesn't back that up convincingly - it only shows hyphens. I can provide one extra example for you, but how would you like it? I can provide text, but others may not be able to see the text, or i could provide an SVG image which could be displayed at approximately normal text size.
Should we consider that the Uyghur example is using a U+0640 ARABIC TATWEEL and call it done, or do you want to supply some alternative example? If you want to offer something else, SVG is indeed good, as that provides reliable rendering.
I don't think we need to worry (for this context) about which character is used if we use the images.
The answer to the question about which character should be used – for implementers of Uighur hyphenation – is not clear, afaict even among the Unicode folks, and needs further discussion. My personal preference is to use tatweel, fwiw.
At this point, I am not sure what the request is on the spec. Do we consider the examples already present good enough to show some diversity, or not?
@frivoal i think we're almost done, but here are some final suggestions:
SVG images for Cree example:
Done (https://github.com/w3c/csswg-drafts/commit/af3f01ae51186efac25ca428d63b73c02ff080b1). Also added a test in WPT for Cree (https://github.com/web-platform-tests/wpt/pull/42523). Thanks for supplying this example.
8 Breaking Within Words https://drafts.csswg.org/css-text-4/#hyphenation https://drafts.csswg.org/css-text-3/#hyphenation
I think it would be worthwhile to add a note which explains that hyphenation should produce a number of effects, depending on the language in question, and give examples, in order to remind implementers to implement a solution that is open to cultural adaptation. These examples include:
It should also be made clear that such effects are triggered not only by browser code applying algorithms, but by
­
orwbr
(see https://github.com/w3c/csswg-drafts/issues/5972) when they fall within a range to which thehyphens
property has been applied (with relevant values), and that­
should only produce a glyph at the end of a line that looks like a hyphen if that is appropriate for the language in question.