sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.65k stars 98 forks source link

Native unicode U+00A0 (nbsp) support #1918

Closed Omikhleia closed 9 months ago

Omikhleia commented 10 months ago

This was mentioned in discussion https://github.com/sile-typesetter/sile/discussions/1716#discussioncomment-4998549, and in PR https://github.com/sile-typesetter/sile/pull/1860#issuecomment-1722185483

It tangentially relates to #1889 -- Via the SILE's unicode node maker, it adds native support to U+00A0 (non-breaking space) in input files being handled as a stretchable and shrinkable space for justification purpose, as per Unicode UAX 14.

Besides the above-mentioned topics, while experimenting with the French SBL bible in USX in https://github.com/Freely-Given-org/BibleTypesetter/pull/3#issuecomment-1835647363, I found it was using non-breaking spaces, defeating the special French punctuation rules from SILE. I also had the issue before, with text copied from LibreOffice (which inserts non-breaking spaces), though of course I just had to remove them from my input file (... that is, when noticing them!). -- So the French case is handled too in this PR.

One test was failing (sura-2) as it contains non-breaking spaces, so I added a setting to disable the feature (languages.fixedNbsp) as I don't know the expectations here and cannot read the script -- French still gets correct punctuation spaces in that case, because it's dumb otherwise anyway. N.B. I haven't documented it in the manual (not sure it should even).

Before: image

After: image

alerque commented 9 months ago

I checked the Sura test and believe the flexible spaces are a better display and more expected output. The point of the non-breaking spaces is to make sure the marker numbers don't end up at the end of a line disconnected from the start of the related text. Maxing the spaces flexible balances those spaces with the rest of the line.