Knuth & Plass line breaking algorithm

basil commented 11 months ago

See Breaking Paragraphs into Lines by Donald Knuth & Michael Plass (1981). Serious typesetting programs that deal with justification, including TeX and Adobe InDesign, use a sophisticated algorithm for breaking paragraphs into lines. In the abovementioned paper, Knuth & Plass present a model of boxes, glue, and penalty specifications to efficiently compute the optimal set of line breaks. The Adobe Paragraph Composer works in much the same way. This avoids excessive hyphenation and rivers of white space in the justified output.

I have never seen anyone attempt to apply this concept to Western notation. About ten years ago I was using Finale, and I had to manually break the score into lines to get good output. I have never used Dorico and have no idea if it is any better.

Applying Knuth & Plass line breaking to Western notation music seems fairly difficult due to the geometric placement of notes on the Western staff, but it seems plausible that this algorithm could be applied to Byzantine notation music. The goal would be to avoid the bad line and page breaks that plague poorly typeset music. For example, most sticheraric and slow heirmologic pieces have a duple rhythm of 4-beat or 6-beat measures. Naïvely breaking the line between an odd-numbered beat and an even-numbered beat is undesirable, so this could be given a worse penalty specification in Knuth & Plass's algorithm. Similarly, a gorgon applies to the neume it is placed on as well as the preceding neume, so a line or page break between these two neumes is also undesirable.

basil commented 8 months ago

The very clever Romanian printer who typeset Anton Pann's books in the 19th century would occasionally use a slightly shorter ison or oligon in order to compress a line rather than expanding it (with the result of ugly rivers of white).

basil commented 8 months ago

Another thing this very clever Romanian printer does is this: when he is forced to justify a line with a lot of whitespace, he adds more space before and after a martyria than he does before and after the other notes of the line. This allows him to justify the line while still keeping the spacing between neumes more consistent with other lines.

basil commented 8 months ago

Both techniques in full display in this September 14 Megalynarion, from Anton Pann's book of Cherubic Hymns and Communion Hymns:

September_14_Megalynarion

basil commented 8 months ago

To translate this into Knuth & Plass' paradigm, the glue (whitespace) around a martyria could have greater stretchability than the glue around a regular neume.

Allowing for a shorter oligon or ison to be substituted during the breakpoint search could have highly beneficial effects with regard to computing the optimal breakpoints, but it would be tough to model in Knuth & Plass' paradigm and might also look inconsistent in a modern digitally-typeset score.

basil commented 1 month ago

I spent some time this week cleaning up my patches from this past December and am getting close to a point where I can submit this feature as a PR. I still have a few bugs left to fix and some areas to clean up, but I have tested the code on all my scores and I am quite pleased with the results. I believe they meet and exceed the quality of the 19th-century classical publications.

@danielgarthur I wanted to gather feedback on two areas as I get ready to submit this feature.

My code at https://github.com/basil/neanes/tree/knuth-plass replaces the first-fit algorithm with the optimum-fit algorithm from the paper by Knuth and Plass. There is no way to restore the old algorithm. My experience is that the new algorithm produces superior results in every case I have tested. I am OK with such a breaking change, since we don't currently make any commitment that old scores will look exactly the same in newer versions of the software. If you are OK with this breaking change, I would prefer to keep the code simple by not trying to support both algorithms. But if you find any examples where the new algorithm is worse than the old, then I would like to hear about them.
I have written a rather large essay about this change at https://github.com/basil/neanes/blob/knuth-plass/notes/knuth_plass.md, which I am sharing now in advance of the final PR because this is a lot of material to read and digest. I am interested in any early feedback on the essay and am happy to incorporate any high-level suggestions as I get closer to submitting a PR.

danielgarthur commented 1 month ago

I have read the essay and it sounds good. I have only just begun to look through the code, but I will continue to do so.

basil commented 1 week ago

I updated my branch tonight fixing most known bugs, but one bug remains. The Gourlay-style spacers are working well within a line, but not at the beginning and ends of lines. I need to split them apart into two halves, one per lyric, and model that in the box-glue algebra. I think I even wrote about it that way in the design document, but never actually implemented it back in December. That's the last portion I know of that needs to be finished.

basil commented 1 week ago

This has proven more difficult than expected, because modeling lyric spacing in the Knuth-Plass algebra requires the use of negative widths and/or negative stretchability, which the library I am using (https://github.com/robertknight/tex-linebreak) does not support. And Knuth does not really describe how to implement it in his paper, although it is implemented in TeX—though the TeX implementation is even more unreadable than the paper due to an even greater level of optimization. I guess I will start by trying to write a test case for negative stretchability in tex-linebreak, based on the centered text example in Knuth's paper. Then I will try to figure out how to enhance tex-linebreak to support this use case. My sense is that the restriction on negative widths is meant to support some of the optimizations in Knuth's paper (which, however, are not as deep as the optimizations in TeX), but also these optimizations might not be needed on modern hardware. So perhaps tex-linebreak can be simplified by rolling back some of these optimizations, at which point it might also be able to support negative widths, after which I can get on with modeling lyric spacing with it.

neanes / neanes

Knuth & Plass line breaking algorithm #90