sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.61k stars 97 forks source link

Hyphenation minimun left/right constraints should be language-specific #2017

Open Omikhleia opened 2 months ago

Omikhleia commented 2 months ago

Issue

The Knuth-Liang hyphenation always defaults to (2, 2) for left hyphen and right hyphen minima, upon initialization:

https://github.com/sile-typesetter/sile/blob/b2cc0841ff603abc335c5e66d8cc3c64b65365eb/core/hyphenator-liang.lua#L105

These are quite sane defaults for the algorithm... but most languages would beg to differ and use different values...

Typically, for instance, English would likely prefer (2, 3), as Babel (LaTeX) implements it:

https://github.com/latex3/babel/blob/d4d55826cd264220b7a8d92b453748564affea54/locale/en/babel-en-GB.ini#L152-L153

Besides segmentation rules and patterns, SILE should likely implement such "per-language" default preferences:

In Babel some are at (2, 2) (e.g. Finnish), most at (2, 3), some at (1,1), etc.

Workaround

(Not a general solution)

\lua{
-- To do after having switched to English language i.e. the "en" hyphenator got instantiated
SILE._hyphenators['en'].rightmin = 3
}

Further thought

Omikhleia commented 2 months ago

Linking to #1994 and #1631 -- I do think this should be part of the same "language refactoring". Perhaps we should have these in a dedicated "project"?

Omikhleia commented 2 months ago

So we might need additional per-language typography tuning files too? E.g. for French:

{
   lefthyphenmin = 2,
   righthyphenmin = 3,
   identfirst = false,
}

(For the last one, see https://github.com/sile-typesetter/sile/pull/1991#issuecomment-2096988578)

Notwithstanding the capability to override them if the user wants it.

alerque commented 1 month ago

Yes, this setting should be tunable per language.

And yes the language code related issues are so intertwined they are hard to track and work on. It's hard to sit down and get my head around the problem or know when an individual issue is actionable. Grouping them all in a "project" sounds like a good idea.

Omikhleia commented 1 month ago

Linking to https://github.com/sile-typesetter/sile/issues/2001#issuecomment-2152703523 - We are not in (2, 2) but likely in (2, 3) by default due to another bug, it seems.