Closed Omikhleia closed 1 month ago
Issue
Relates to #2017 regarding the hard-coded minWord = 5 value, but it's however a different type of issue here:
minWord = 5
The logic is not UTF-8 compliant:
https://github.com/sile-typesetter/sile/blob/b2cc0841ff603abc335c5e66d8cc3c64b65365eb/core/hyphenator-liang.lua#L58-L63
string.len
hyphenationmin
text:lower()
string.lower
SU.map
Proofs / Minimal examples
The second case here, with minWord at 6, would be expected not to hyphenate "léris":
> SILE.showHyphenationPoints("léris", "fr") lé-ris > SILE._hyphenators["fr"].minWord 5 > SILE._hyphenators["fr"].minWord = 6 > SILE.showHyphenationPoints("léris", "fr") lé-ris > -- OOPS. "léris" is 5-character long (but 6-byte long) > SILE._hyphenators["fr"].minWord = 7 > SILE.showHyphenationPoints("léris", "fr") léris
We override a pattern below, but it doesn't work with an uppercase input (bypassing the exception).
> SILE.call("hyphenator:add-exceptions", { lang="fr" }, { "légè-rement" })% Override as exception > SILE.showHyphenationPoints("légèrement", "fr") légè-rement > SILE.showHyphenationPoints("LÉGÈREMENT", "fr") LÉGÈ-RE-MENT > -- OOPS, expected "LÉGÈ-REMENT"
Issue
Relates to #2017 regarding the hard-coded
minWord = 5
value, but it's however a different type of issue here:The logic is not UTF-8 compliant:
https://github.com/sile-typesetter/sile/blob/b2cc0841ff603abc335c5e66d8cc3c64b65365eb/core/hyphenator-liang.lua#L58-L63
string.len
is not UTF8-safe, so the minWord value is likely not honored as it oughthyphenationmin
text:lower()
is not UTF8-safestring.lower
call a bit later in aSU.map
(... and aren't we performing the lowercase operation again?somehow acceptable)Proofs / Minimal examples
The second case here, with minWord at 6, would be expected not to hyphenate "léris":
We override a pattern below, but it doesn't work with an uppercase input (bypassing the exception).