latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
124 stars 34 forks source link

doc: Bad German example for `\babelposthyphenation` #229

Closed lemzwerg closed 1 year ago

lemzwerg commented 1 year ago

On page https://latex3.github.io/babel/guides/non-standard-hyphenation-with-luatex.html you can find the following example.

\documentclass{article}

\usepackage[ngerman]{babel}

\babelposthyphenation{ngerman}{([fmtrp]) | {1}}{
  { no = {1}, pre = {1}{1}- },
  remove,
  {}
}

\begin{document}

\rightskip5cm

Auffrisierende Auffrisierendem Auffrisierenden Auffrisierender
Auffrisierendes Auffrisierst Auffrisiert Auffrisierte Auffrisiertem
Auffrisierten Auffrisierter Auffrisiertes Auffrisiertest Auffrisiertet
Auffrisst Auffuhr Aufführbar Aufführbare Aufführbarem Aufführbaren
Aufführbarer Aufführbares Aufführe Auffuhren Aufführen Aufführend
Aufführende Aufführendem Aufführenden Aufführender Aufführendes

\end{document}

that creates the following output (ASCIIfied, with added arrows by me to indicate the important spots):

Auffrisierende Auffrisierendem Auffrisieren-
den Auffrisierender Auffrisierendes Auffrisierst
Auffrisiert Auffrisierte Auffrisiertem Auffrisier-
ten Auffrisierter Auffrisiertes Auffrisiertest Auff-    <---
frisiertet Auffrisst Auffuhr Aufführbar Aufführ-
bare Aufführbarem Aufführbaren Aufführbarer
Aufführbares Aufführe Auffuhren Aufführen Auff-    <---
führend Aufführende Aufführendem Aufführen-
den Aufführender Aufführendes

Two problems.

  1. The result is not shown on the webpage, making it hard to understand what's actually going on.
  2. The example is plain wrong for German: a. The rules with adding or modifying letters only happen in the old German orthography. Using ngerman must thus be replaced with german. b. It is not possible to cover the 'ff → ff-f' or 'ck → k-k' rules with simple patterns. You actually need a list of words for that.

You might download the wortliste file the current German hyphenation patterns are based on; there you can find entries like ab<blo{ck/k-k}e or Schi{ff/ff=f}ahrt, which you could use to construct a real-word, useful example. In particular, none of the words given in the Babel webpage example are ever written with 'ff-f' in the old orthography.

Generic handling of item 2b would need the creation of special hyphenation patterns that cover just the cases marked with {...} in wortliste – then, and only then, simple rules as the one currently shown in the Babel documentation might be used.

jbezos commented 1 year ago

Sure, and the example for Spanish is not quite correct, either. My purpose was not to show the actual rules of any language, but just simple examples to start with. With your pointer, the ‘real’ thing can be predefined in babel (currently there are no predefined transforms for german). The situation is similar to that described here for Norwegian. Anyway, I’ll provide more realistic examples.

jbezos commented 1 year ago

@lemzwerg I’ve revised the article. By the way, where can I find an explanation of the syntax followed in wortliste?

lemzwerg commented 1 year ago

Thanks. Some things that I noted:

The syntax used in wortliste is explained here (only in German, sorry).

@AlMa0r, please have a look, too!

jbezos commented 1 year ago

Danke (kein Problem, ich spreche ein bisschen Deutsch).

As to compound words, hyphenation points are taken into account, which means the rule isn’t applied in words having a discretionary before and/or after the ‘ck’ group. So, \showhyphens{Druckeinstellung Trockenerzeugnis} shows:

Underfull \hbox (badness 10000) in paragraph at lines 15--15
[] \TU/lmr/m/n/10 Druck-ein-stel-lung Trok-ken-erzeug-nis

Explaining this explicitly would be useful, because very often what we need is a combination of hyphenation patterns with post-hyphenation rules.

lemzwerg commented 1 year ago

Aaah! Indeed, I missed that, and it should be definitely added to the explanations :-)

jbezos commented 1 year ago

I’m closing this issue because the main problem has been (I hope) solved. Now the examples for German are more realistic.

ghost commented 1 year ago

Thanks. Some things that I noted:

* full o realistic  →  full or realistic\

* more complete  →  more detailed

* maybe it makes sense to mention that the shown 'ck' rule for German doesn't work for compound works, for example "Druck=ein-stel-lung" – this can really be only handled with special hyphenation patterns (or corresponding word lists)

The syntax used in wortliste is explained here (only in German, sorry).

@AlMa0r, please have a look, too!

@lemzwerg I took a look at the updated documentation. The first issue I saw (after the author adapted the documentation) is explained in great detail in http://github.com/latex3/babel/issues/230#issuecomment-1502499688 . I stopped reading at the first issue discovered. As for the small differences in the values after penalty= in the third argument of \babelposthyphenation, after the recent update of TeX Live, they do have a nonrandom (and often expected) influence of the output. Therefore, the results of my experiments oppose the view of David Carlisle (who thought that small differences 1 or 2 to the standard penalty 50 should not matter or matter only seldom and advised differences of twenties): in a book of 474 pages with 109 instances of \babelposthyphenation{…}{…}{…} in the LaTeX input, the actual hyphenation at 27 positions in the output has changed.