hyphenation / tex-hyphen

Hyphenation patterns for TeX
53 stars 20 forks source link

Hyphenation of the Estonian word "näidislahendused" incorrect #60

Open piiskop opened 8 months ago

piiskop commented 8 months ago

I was looking for how to show the hyphenation locations and found a program:

The input:

\documentclass{report}
\usepackage[estonian]{babel}
\def\rehbox{\unskip\unpenalty\setbox2\lastbox\ifhbox2
    \setbox0\hbox{\hbox{\unhbox2} \unhbox0}\expandafter\rehbox\fi}
\newcommand\printhyphens[1]{%
    \setbox0\vbox{{\setbox0\hbox{}%
            \pretolerance-1\hsize=0pt\hfuzz=\maxdimen
            \noindent\hspace*{0pt}#1\par\rehbox\unhbox0}\par}%
    \unvbox 0
}

\begin{document}
    \printhyphens{näidislahendused} 
\end{document}

The output:

näi- dis- la- hen- dus- ed

The expected output:

näi- dis- la- hen- du- sed

Please follow the rule:

  1. Üksik kaashäälik täishäälikute vahel kuulub järgmisse silpi: ko-ju, du-ši, Lii-na.
mnater commented 8 months ago

I'm not sure, but this seems to be an issue with babel: The estonian patterns are computed with righthyphenmin = 3 https://github.com/hyphenation/tex-hyphen/blob/ecf976ab6995acb653d38ab1af0b9b9829ec0c77/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-et.tex#L49 but babel uses righthyphenmin = 2 https://github.com/latex3/babel/blob/b488d60c6b12eefc664077666b4e207f90b63889/locale/et/babel-et.ini#L152 .

If righthyphenmin was set to 3 (as requested by the patterns) the word would hyphenate to näi•dis•la•hen•dused missing the last hyphenation opportunity but being correct.

jbezos commented 8 months ago

@mnater The ‘hyphenmins’ in the hyphenation files doesn’t necessarily reflect the values set in patgen to generate them. This is particularly true in rule-based patterns (ie, created without patgen), like those for spanish, which sets the hyphenmins to 2/2, the ‘typographical’ limit, as opposed to the ‘technical’ limit (which is 1/1). Please, also note the original babel style and the patterns are the work of one person, so it’s doubtful there is a mistake here (btw, polyglossia also sets 2/2).