Open Omikhleia opened 6 months ago
Linking to #1994 and #1631 -- I do think this should be part of the same "language refactoring". Perhaps we should have these in a dedicated "project"?
So we might need additional per-language typography tuning files too? E.g. for French:
{
lefthyphenmin = 2,
righthyphenmin = 3,
identfirst = false,
}
(For the last one, see https://github.com/sile-typesetter/sile/pull/1991#issuecomment-2096988578)
Notwithstanding the capability to override them if the user wants it.
Yes, this setting should be tunable per language.
And yes the language code related issues are so intertwined they are hard to track and work on. It's hard to sit down and get my head around the problem or know when an individual issue is actionable. Grouping them all in a "project" sounds like a good idea.
Linking to https://github.com/sile-typesetter/sile/issues/2001#issuecomment-2152703523 - We are not in (2, 2) but likely in (2, 3) by default due to another bug, it seems.
Issue
The Knuth-Liang hyphenation always defaults to (2, 2) for left hyphen and right hyphen minima, upon initialization:
https://github.com/sile-typesetter/sile/blob/b2cc0841ff603abc335c5e66d8cc3c64b65365eb/core/hyphenator-liang.lua#L105
These are quite sane defaults for the algorithm... but most languages would beg to differ and use different values...
Typically, for instance, English would likely prefer (2, 3), as Babel (LaTeX) implements it:
https://github.com/latex3/babel/blob/d4d55826cd264220b7a8d92b453748564affea54/locale/en/babel-en-GB.ini#L152-L153
Besides segmentation rules and patterns, SILE should likely implement such "per-language" default preferences:
In Babel some are at (2, 2) (e.g. Finnish), most at (2, 3), some at (1,1), etc.
ini
files for all supported languages....Workaround
(Not a general solution)
Further thought
This was probably overlooked (due to other issues), but (language-specific / custom) left/right hyphen minima were actually mentioned in an existing issue (now closed): #308, with rather extreme values in the LaTeX example (3, 5).
AFAIK, Typst (hypher) seems to implement these right/left minima per languages (in one big file): https://github.com/typst/hypher/blob/6b40344866f2d7b2e156db93e91cf105cb75f7a2/src/lang.rs#L201-L205C1.
While at it, the current Knuth-Plass line breaker use a single hyphenPenalty (probably as TeX does), but we could use variable penalties depending on initial/final segment lengths. That is to say, rather than being behind LaTeX (and/or TeX, which we are here), there would be a way to have improvements.