w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 661 forks source link

[css-text-4] Suggest algorithm for multi-line #803

Open frivoal opened 7 years ago

frivoal commented 7 years ago

css-text-4 introduces text-wrap: multi-line

As discussed in https://github.com/w3c/csswg-drafts/issues/672#issuecomment-264016117, It could be useful to give a suggestion as to one possible algorithm. This would not prevent implementors from trying to come up with something better, but for those who are willing to go ahead an implement this without themselves being experts at line breaking algos, giving a reasonably non controversial baseline approach would be useful.

fantasai commented 6 years ago

Agree 100% with doing this. Just need @liamquin to give me a reference I can point at...

liamquin commented 6 years ago

An algorithm i've used in the past is

  1. put as much text as will fit on the line as possible, using a preferred (not minimum) space between words;
  2. if the line is tighter (less space left over) than the previous line, move the first word or word fragment on this line back to the end of the previous line, as long as that won't make the spaces on the previous line smaller than than the minimum or the spaces on this line larger than the maximum. Generalizing this to n lines brings an incremental improvement at the cost of not being able to render until you've got n lines; i've usually stuck with n being 2 or 3, but i've heard of systems going as high as nine. Note that as long as you only ever move words backwards, the algorithm is stable and is much faster than Knuth-Plass. It does assume a model of absolute-min/preferred/max-preferred word spaces. Two further refinements seen in many for-paper systems is (1) an option to use the preferred space and set the line unjustified if the word space exeeds the maximium, and (2) semi-justiication, in which you only justify lines whose length with preferred space is within a set distance (usually 25%) of the line length (and where doing so doesn't exceed the max preferred space) and others are set unjustified (e.g. left-aligned, or right-aligned for Hebrew, or whatever). Again, these two options work fine with an n-line window.

The Knuth-Plass algorithm tends to have some nasty edge cases (e.g. when TeX says "overfull hbox" at you) making it less than ideal for formatting where the author isn't reviewing the results. E.g. sometimes it decides to put only two words or word fragments on a line, one at each end with a huge gap, rather than have several hyphens in a row. It's also NP-complete in performance (on the number of potential word fragements after hyphenation in th paragraph). where the "move lumps back" compromise is either linear or at worst n^2 if you do it recursively, with n being the number of lines.