Open s10wen opened 5 years ago
There are already controls for widows and orphan lines and page/column breaks in https://drafts.csswg.org/css-break/#widows-orphans.
A control for widowed words on the last line could be useful, but it doesn't exist yet. However, I suspect it needs to be paired with a better line breaking algorithm than the current greedy one to achieve good results. If all the lines of the paragraph are re-balanced to push the needed word(s) to the last line, all may be fine, but just pulling from the line before last is probably going to lead to sub-optimal results. Also, we'd need to figure out what this means in languages that do not separate words with spaces, such as Japanese or Chinese.
I suppose there's prior art in other software, and we should have a look at what they do there. InDesign maybe?
We have previously talked about the idea of being able to specify a minimum last line length, but in characters or a percentage of width, not in the number of words:
@s10wen chiming in on current work arounds. If you're willing to put presentational matters in html, like your demos use of <br>
, it's instead optimal to place a
character between the last two words, making it so they don't have to create a new line if they fit on their current line together.
Wouldn't a <span>
with white-space: nowrap
be a cleaner solution?
Hey all, thanks for the conversation around this.
https://drafts.csswg.org/css-break/#widows-orphans seems to be most likely what I'd like to see. Is there anywhere I can see the progress of this being implementing to browsers and test?
I've just changed the title of this issue to make it clearer that we're talking about orphaned words on a line, not orphaned lines on the top/bottom of a page or column (which is what the widows
and orphans
properties are about).
I came here because of this thread by the CSS-Tricks team about workarounds to avoid bad-looking breaks. This is clearly a common situation where people are modifying their markup to get typographically pleasant results, and that is really a problem that CSS should try to solve.
CSS Text 4 has a heading for this topic ("Last Line Minimum Length"), with an issue summary but no solution, with a cross-reference to a 2015 mailing list discussion. Copying over the current issue text from the spec:
Issue is about requiring a minimum length for lines. Common measures seem to be
- At least as long as the text-indent.
- At least X characters.
- Percentage-based.
Suggestion for value space is
''match-indent | <length> | <percentage>''
(with Xch given as an example to make that use case clear). Alternatelycould actually count the characters. It’s unclear how this would interact with text balancing (above); one earlier proposal had them be the same property (with 100% meaning full balancing).
People have requested word-based limits, but since this is really dependent on the length of the word, character-based is better.
My own opinions to get the discussion started:
I think the property should support a minimum number of characters (as an alternative to minimum % of inline size or minimum line length) for the final line. That covers most typographic style guides while still avoiding any discussion about what is or isn't a word across different languages, or when words are broken by hyphenation.
Since we're assuming that most implementations will be using a greedy line-breaking algorithm, maybe the property could accept a second value that would be the minimum number of characters for previous lines, at which point no more attempt at removing the widow should be made. (If the second value isn't specified, an auto behavior applies: don't make the second-to-last line shorter than the last line while trying to make the last line longer! In fact, this should apply regardless of whether you also specify a minimum length for the second-to-last line!) If a more complicated text-wrap
justification algorithm applies, the rules about "second-to-last line" apply to all previous lines in the block.
min-last-line: 8 / 20; /* make sure the last line is at least 8 characters long,
unless doing so would make the previous line less than 20 characters long */
min-last-line: 3em; /* make sure the last line is at least 3em long,
unless doing so would make the second-to-last line shorter */
Or maybe it would be better to have two properties: min-line-last
which is specifically about avoiding the orphaned short line at the end of the block, versus min-line
that is a hint about the ideal minimum length for any lines except the last. In addition to avoiding over-compensation when padding the last line, it could define a trigger at which point smarter/more expensive text-wrap
and hyphenation strategies should be employed.
PS, I think it would also be helpful if there were some figures in the spec about widows and orphans to make it clear that those properties aren't about bad line breaking, but about bad block breaking. If anyone wants to create that, I'm sure a PR would be welcome!
I want to add my two cents to push this conversation further. The following HTML after being "prettified" will break:
<blockquote>
Lorem ipsum then prettify will push closing tag to a new line
</blockquote>
And in CSS, this quotation mark might appear on a new line, and there is no way around it, except to disable prettifying code and making sure blockquote ends on the same HTML line of code. I wish there was a way to target :last-line
the way we target :first-line
.
blockquote:after {
content: '"'
}
There's a similar looking problem around too-short first lines, if the in-line alignment does not match the reading direction.
See the BBC Subtitle Guidelines section on Breaks in justified subtitles for example:
That only happens with explicit breaks tho, correct? We'll otherwise always fill the first line approximately the same as subsequent lines.
Further to my comments in the above thread, and following @kojiishi's response, I talked at length with other designers at Clearleft and it was surprisingly difficult to come to a definitive conclusion, particularly around the exceptions.
By which I mean: put simply one doesn't want just a single word on the final line of a block, but what's the effect of bringing down a word from the previous line in order to address that? If you were fixing this manually, there might be a ripple affect back up the paragraph until the best overall text shape is achieved. I doubt that's something a browser could afford to do, given the (understandable) reluctance to implement any justification routines beyond the crudest greedy method.
The best conclusion we could come up with was something similar to the solution proposed by @AmeliaBR. Set a minimum character length for the final line along with a maximum number of characters to bring down from the previous line. This would be conceptually similar to the hyphenate-limit-chars property.
min-last-line: 12 6
where 12
is the minimum line length in characters, and 6
is the maximum number of characters that can be brought down from the previous line to make that so. If the 6
is omitted, it would assumed to be equal to the 12
.
It might be useful to some people for the same approach to be expressed as percentages of box width instead:
min-last-line: 20% 10%
where 20%
is the minimum length of the final line in terms of percentage of box width, and 10%
is the maximum length that can be removed from the previous line.
It might be that the two methods (chars and %) could be mixed.
I've put together a (very) rough-and-ready proof of concept here.
The idea is to have something to test out the concept of a minimum final line length and maximum amount of text that can be brought down from the line above to address that.
Please feel free to have a play, copy, adapt and generally improve. Comments very welcome, here preferably.
A question on an edge case came up in mind: what to do with a paragraph with "[short word] [long word] [short word]"? An example (there might be better examples but...):
It's uncopyrightable,
no?
or
It's
uncopyrightable, no?
The former is better, no?
A question on an edge case came up in mind: what to do with a paragraph with "[short word] [long word] [short word]"? An example (there might be better examples but...):
It's uncopyrightable, no?
or
It's uncopyrightable, no?
The former is better, no?
Agreed, the former is better. This would be handled by the min-last-line: 20% 10%
rule which says that 10% is the maximum length that can be brought down from the penultimate line.
I dup'ed https://github.com/w3c/csswg-drafts/issues/2396 to this issue.
From that other issue, I said:
I actually implemented this years ago (named
-apple-trailing-word: -apple-partially-balanced
) as a nonstandard property because we got some internal requests for this. (I then removed support once our internal teams stopped using it). And now we're getting more internal requests for this.It's not just internal requests, though:
- https://stackoverflow.com/questions/4823722/how-can-i-avoid-one-word-on-the-last-line-with-css
- https://stackoverflow.com/questions/31974448/how-can-i-prevent-having-just-one-hanging-word-on-a-new-line-in-an-html-element/31974553
This is something we'd like to see added to CSS.
We could either do it the way I did years ago (a simple on/off switch) or we could mirror the design of
orphans
andwidows
and have it take an integer value.
We are now getting even more requests for this. (Every time this comes up, I always go searching for which property controls this, only to be surprised yet again that there is no way to do this and it's impossible.)
It's probably also worth noting that the requests we have for this feature are not about the number of "words" on that last line, as that necessarily doesn't actually solve the visual problem when the last n words are short (or, you're writing in Chinese and the last few words are each just single characters). The request, instead, is to say "the last line is at least x% of the width of the block container."
I suppose there's prior art in other software, and we should have a look at what they do there. InDesign maybe?
In InDesign, because what you see is what you get, adjusting some inconspicuous gaps between characters in the penultimate line should work.
Another approach is applying a GREP style, indicating that the last few characters/words in a paragraph cannot be broken into two lines.
Wouldn't a
<span>
withwhite-space: nowrap
be a cleaner solution
If there are not many paragraphs, using <span>
s with white-space: nowrap
or Zero Width Joiners (for writing systems like Chinese, Japanese, Batak, Tai Le, Khmer, Thai, etc.) or non-breaking spaces between the last few words probably works fine (although ZWJs might have an impact on glyph rendering), but if we want this for the entire document, then it is too much trouble.
See also Handling of Widows and Orphans in clreq and Widow Adjustment of Paragraphs in jlreq.
The CSS Working Group just discussed [css-text] Preventing too-short final lines of blocks (Last Line Minimum Length)
, and agreed to the following:
RESOLVED: Add a control that is either a property or a value that causes UAs to make the last line longer than it would've originally done unless that was a bad idea
Just an observations on the question of how to specify the length of the last line (and possibly also the gap on the previous line). It seems to me that using line length percentages, based on the rendered text, is better than counting characters.
Counting characters is problematic in a large number of non-Latin languages because they use (often multiple) combining marks, which are combined into the same 2-dimensional space as a base character. For example, 10 characters in some languages can be very short, compared to 10 characters in English, eg. أَنْتُنَّ contains 9 characters, but is only about 3-4 Latin characters in width. Similarly, an emoji such as 👨👩👧👦 contains 7 characters in about the width of a couple of english letters.
It seems to me that using line length percentages, based on the rendered text, is better than counting characters.
Agreed. See prior comments: https://github.com/w3c/csswg-drafts/issues/3473#issuecomment-1474837736
Yep.
The request, instead, is to say "the last line is at least x% of the width of the block container."
Hi folks. Writing with my TAG 🎩 on.
Given that Chromium is very keen to ship text-wrap: pretty
and orphan control is one of the main heuristics employed in it (in fact, it was the only one when it was first submitted for TAG review), it would be good to finalize the name of this property plus a way to use it to avoid orphans (even if it gets more syntax in the future), to prevent text-wrap: pretty
being evangelized further as a way to avoid orphans.
Switching to my CSS WG 🎩 now to discuss specific syntax:
What about breaking text-wrap
into longhands to allow customizing some or all aspects of line breaking, and have keywords like balance
or pretty
correspond to certain values for these longhands. Orphan control could then be achieved via text-wrap-orphans
, with high level values like avoid
and normal
for an MVP, while we debate syntax for giving authors more control. text-wrap
pretty would then correspond to that semi-magical value.
I am unconvinced about decomposing balance
or pretty
into a bunch of individual knobs.
The interaction between these knobs is about as interesting as the knobs themselves: if pretty implies "avoid orphans" and "avoid rivers" and "avoid several hyphenated lines in a row", do we then need not just these three, but also a choice between "avoid orphans unless it would create rivers" vs "avoid orphans even if it creates rivers"? "how about avoid orphans even if it creates rivers or consecutive lines with hyphenations, as long as it doesn't create both, but either way, don't be more than 250% percent slower than brute force line breaking"? A good algorithm for pretty
needs to balance a whole bunch of tradeoffs.
Yes, there's some measure of subjectivity in those tradeoffs, so providing author control is tempting, but:
The interaction between these knobs is about as interesting as the knobs themselves: if pretty implies "avoid orphans" and "avoid rivers" and "avoid several hyphenated lines in a row"
especially as there is already - in theory - control for limiting consecutive hyphens with hyphenate-limit-lines
https://www.w3.org/TR/css-text-4/#propdef-hyphenate-limit-lines
RESOLVED: Add a control that is either a property or a value that causes UAs to make the last line longer than it would've originally done unless that was a bad idea
Like text-wrap-style: pretty
, this opts into a different line breaking algorithm aimed to make things look better, for some definition of better. And arguably, since the definition of pretty
is quite open ended, a user agent could choose to implement pretty
in just the right way to make nice last lines. But they might also do a whole lot more, and nothing requires that they pay special attention to the last line, and the performance profile between something that just cares about the last line and something that cares about the whole text is likely different. So we're basically looking at another variant of pretty
but with a different bias in terms of what tradeoffs to make.
For the sake of the argument, let's call that text-wrap-style: pretty-last-line
.
Both pretty
and pretty-last-line
:
auto
pretty
:
pretty-last-line
:
If I may leave a note as an author: we noticed text-wrap: pretty
had unexpectedly shipped in Chrome/Chromium, ones we had installed too, and, excited by the prospect of real Knuth-Plass layout in >60% of browsers (according to CanIUse), took a look at how it affects our website.
Unfortunately, pretty
in Chrome seems to not really do K-P as one would expect, and to be highly opinionated. While we like how it handles orphaned-words to bump them from 1 to 2, usually, we do not like the extremely aggressive removal of hyphenation, which results in drastic s t r e t c h e d lines with our justified text*. (Here are 34 before-after pairs on Windows Chrome 123 on my site: https://share.obormot.net/temp/text-wrap_screenshots.zip Most of them show this behavior. This is on a wide desktop screen; presumably, if we checked thoroughly on very narrow screens like mobile, the stretching behavior would be far worse, because it usually is and that is why we had to add Hyphenopoly to fix bad browser hyphenation.)
If we could opt into the orphaned-words bit, we would, and it's not obvious to me why bumping a word to the next line would have to change all the hyphenations & multiple lines as its tradeoff, but pretty
seems to be a package deal right now, so we can't use it. I also can't seem to find any CSS property, under any name, which would achieve the same effect: the CSS widow
/orphan
line properties do totally different things, most people just burble enthusiastically about how pretty
solves all your word-orphan problems (which it may, but little mention of drawbacks), and the suggestions after that seem to be either 'manually insert no-break space at the end of paragraphs everywhere forever' or 'run some wacky JS'.
The opinionated package deal is also worrisome because given how aggressive the treatment of hyphenation is, and how this is not apparently part of any standard, why expect other browsers to adopt this opinion? We definitely wouldn't want to adopt it for Chrome, and then 5 years later discover (or worse, not discover) that it looks terrible on Firefox or Safari after they ship their particular flavor...
(Looking into this was not assisted by the very confusing terminology. For example the official Chrome blog which is some of the only documentation on what the shipped pretty
does mentions that its treatment of orphan-words differs from the CSS orphans
... which isn't about words at all, it's about lines.)
* almost all of the discussion & design doc seems to assume left-justification only, and ignore center, right, & fully-justified text, so perhaps this was just an oversight in the tuning?
Thanks for all the examples, @gwern. I’m not sure I agree they are all failures of the current implementation. Most of the after results appear better or at least as good as the befores to me. For instance, 49/50 does have wider spacing but I think it’s arguably better (spacing is still consistent across the paragraph and a 2-hyphen ladder is removed). And 53/54 does have wider spacing but it might be a case where justification in general is causing problems (there is an unaffected line starting with “reader mode” that looks bad both before and after).
The ones I do see a problem with are
65/66 is a bad result, I agree 69/70 the sidenote is worse, but the main column change looks OK 71/72 is worse, likely because the current implementation isn’t looking back enough lines 81/82 sidenote 31 is worse, but the rest of the changes look OK 89/90 is worse, but the narrow columns make this a hard case. Perhaps the weighting against a short last line should not have resulted in any change for the first two paragraphs, but the third seems fine
I didn't say the current Chrome implementation of pretty
had failed - just that it was opinionated, and we had other opinions. (You and the Chrome dev Ishii perhaps have more tolerance of spacing and more dislike of hyphens than we do, and there's no arguing taste there; but we've been burned by some extremely bad looking lines on mobile when line-stretching happens, so we worry about it a lot, while hyphens are so ordinary as to not be a big deal or worth incurring costs like stretched lines to minimize.)
But if a lot of the instances aren't clearly better even by your assessment, you can see why we wouldn't be too eager to go out of our way to add & debug fairly exotic new CSS to opt into this brand new, possibly buggy, non-standardized-cross-browser, opinionated package of changes, when there's just one part we are sure we want.
And so I am simply mentioning, in the context of the discussion of whether to add knobs, that we would like a knob for that part, and would not like to have use pretty
as a take-it-or-leave-it deal, because as it stands now - we would have to leave it.
One further thing I would mention after looking at the mobile screenshot pairs too, which look better than I expected: https://share.obormot.net/temp/text-wrap_screenshots_mobile.zip It's hard to predict what Chrome pretty
currently does!
I've stared at many of these, but still can't look at the 'before' and predict what will happen in the 'after', particularly how the word-orphans are treated. As best as I can tell, word-orphan fixes are strictly subordinate to the hyphenation changes: it doesn't seem like the word-orphans ever get changed unless there is a previous hyphenation change already being made, regardless of how easy & straightforward a word-orphan change might be. It seems like pretty
treats word-orphans as an add-on or afterthought, to be modified only if it's already doing a change, otherwise, not modified at all...?
This is confusing, and I don't think users understand it - I don't recall any of the people advertising pretty
as 'solving your word-orphan problems` on Twitter/StackOverflow/Reddit as including the caveat 'but only if that word-orphan is part of a paragraph whose hyphenation is being changed already, otherwise all your word-orphans are still there'. And I don't think anyone would request a feature like "fix my word-orphans but only some of the time, dependent on a usually unrelated problem being fixed". ("What Would Knuth Do?" Probably not that.)
Agenda+ to give the WG a heads up that I've implemented the resolution in https://github.com/w3c/csswg-drafts/issues/3473#issuecomment-1646267487 by adding text-wrap-style: avoid-orphans
.
CodePen example: https://codepen.io/s10wen/pen/GPWWyP?editors=1100#0
Tweet + replies: https://twitter.com/s10wen/status/1076079575506083840
Wikipedia explanation: https://en.wikipedia.org/wiki/Widows_and_orphans
CSS Text 3 w3 Spec: https://www.w3.org/TR/css-text-3/
The above links have led me here, to further pursue this. I'm wondering if anything currently exists, or could be implemented to handle this. My idea is that
orphan: 2
would always leave two strings of text together, please see the CodePen for an example. Or, it could be thatorphan: true
, would mean that orphans always had at least 2 words.