latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
123 stars 34 forks source link

[british]babel should spit out a warning of a noticeable kind if hyphen-english is missing #290

Closed AlMa1r closed 3 months ago

AlMa1r commented 3 months ago

Running latex from TeX Live 2024 on

\documentclass[british]{article}
\usepackage[british]{babel}
\begin{document}
\showhyphens{theorem theorems}
\end{document}

yields

the-o-rem the-o-rems

on the console, which is the US-English hyphenation according to http://www.merriam-webster.com/dictionary/theorem . The actual UK-English hyphenation of the singular noun “theorem” is, to the best of my knowledge, “the-orem” (don't ask me about the plural) according to The Oxford spelling dictionary, Robert Edward Allen, 1986, p. 264 , New Oxford spelling dictionary, Maurice Waite, 2005, 3rd edition, p. 526 , and Oxford advanced learner's dictionary of current English, Albert Sydney Hornby, 2015, 9th edition, p. 1567 .

Looking into the log, we discover

Package babel Info: Hyphen rules for 'british' set to \l@english
(babel)             (\language0). Reported on input line 82.
Package babel Info: Hyphen rules for 'UKenglish' set to \l@english
(babel)             (\language0). Reported on input line 83.

It turned out that babel-english was installed but hyphen-english was not, and there was no good warning or error about this in the log or on the console beyond the aforementioned mappings to \language0. This information about the US-English hyphenation for british could go easily unnoticed in huge logs, and the visual outputs of American English and British English are often too close (so that if only few UK-English words are scattered in a huge foreign-language text, we might not even notice the wrong hyphenation). Could we get at least a Warning (or error, undefined, missing, failed, unknown, …, (capitalized if necessary)) in the log or on the console if the British hyphenation is requested by the LaTeX document but fails or is likely to fail in some way (here, fallbacks to default)?

jbezos commented 3 months ago

I’ll reformulate the issue in more general terms:

Detect a misconfigured language.dat

The short answer is, language.dat may be modified locally and babel cannot know what the user wants.

This message does not necessarily mean something got wrong. For example, with the following ‘info’ the setting is (very likely) quite correct:

Package babel Info: Hyphen rules for 'australian' set to \l@ukenglish
(babel)             (\language23). Reported on input line 105.

How hyphenation patterns are assigned is dependent not only on what the corresponding ldf does, but also on the local configuration with language.dat, which should provide by default some sensible settings. But since language.dat may be changed locally, babel cannot know if the result is the intended one. So, a noisy warning can be, in fact, misleading and cumbersome. It’s up to the user to decide if things are right or wrong.

IMO, here the real problem is this one:

It turned out that babel-english was installed but hyphen-english was not

This is a dependency which has to be properly managed by the installer.

AlMa1r commented 3 months ago

@jbezos Thank you for looking into this. Following your “which has to be properly managed by the installer”, I submitted a report to the TeX-Live list at tug.

jbezos commented 3 months ago

@AlMa1r Anyway, I've added it to the list of enhancement requests.

AlMa1r commented 3 months ago

@AlMa1r Anyway, I've added it to the list of enhancement requests.

My post in the tex-live list at TUG was technically mis-formatted. Here it is in good formatting:


When babel-english gets installed, but hyphen-english has not yet been installed and doesn't get installed, we don't get any warning. We also don't get any warning later when compiling a LaTeX document with intended British hyphenation and getting a US hyphenation instead. Therefore, we kindly ask to explicitly warn the user (or to force installing the British hyphenation when babel-english gets installed). The babel maintainer said in https://github.com/latex3/babel/issues/290#issuecomment-2023121848 that it's the task of the installer. Our test:

$ tlmgr show babel-english hyphen-english | grep installed
installed:   No
installed:   No
$ tlmgr install babel-english
tlmgr: package repository https://ctan.space-pro.be/tex-archive/systems/texlive/tlnet (verified)
[1/1, ??:??/??:??] install: babel-english [137k]
running mktexlsr ...
done running mktexlsr.
tlmgr: package log updated: /home/username/usr/local/texlive/2024/texmf-var/web2c/tlmgr.log
tlmgr: command log updated: /home/username/usr/local/texlive/2024/texmf-var/web2c/tlmgr-commands.log
$ cat > mwe.tex
\documentclass[british]{article}
\usepackage[british]{babel}
\begin{document}
\showhyphens{theorem theorems}
\end{document}
$ latex mwe
This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024) (preloaded format=latex)
 restricted \write18 enabled.
entering extended mode
(./mwe.tex
LaTeX2e <2023-11-01> patch level 1
L3 programming layer <2024-03-14>
(/home/username/usr/local/texlive/2024/texmf-dist/tex/latex/base/article.cls
Document Class: article 2023/05/17 v1.4n Standard LaTeX document class
(/home/username/usr/local/texlive/2024/texmf-dist/tex/latex/base/size10.clo))
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/babel.sty
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/txtbabel.de
f)
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/bri
tish.ldf
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/eng
lish.ldf)))
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/locale/en/b
abel-british.tex)
(/home/username/usr/local/texlive/2024/texmf-dist/tex/latex/l3backend/l3backend
-dvips.def)
No file mwe.aux.
Underfull \hbox (badness 10000) in paragraph at lines 4--4
[] \OT1/cmr/m/n/10 the-o-rem the-o-rems
(./mwe.aux) )
(see the transcript file for additional information)
No pages of output.
Transcript written on mwe.log.
$ grep babel mwe.log 
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/babel.sty
Package: babel 2024/02/07 v24.2 The Babel package
\babel@savecnt=\count196
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/txtbabel.de
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/bri
Language: british 2017/06/06 v3.3r English support from the babel system
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/eng
Language: english 2017/06/06 v3.3r English support from the babel system
Package babel Info: Hyphen rules for 'british' set to \l@english
(babel)             (\language0). Reported on input line 82.
Package babel Info: Hyphen rules for 'UKenglish' set to \l@english
(babel)             (\language0). Reported on input line 83.
Package babel Info: Hyphen rules for 'canadian' set to \l@english
(babel)             (\language0). Reported on input line 102.
Package babel Info: Hyphen rules for 'australian' set to \l@english
(babel)             (\language0). Reported on input line 105.
Package babel Info: Hyphen rules for 'newzealand' set to \l@english
(babel)             (\language0). Reported on input line 108.
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/locale/en/b
Package babel Info: Importing font and identification data for british
(babel)             from babel-en-GB.ini. Reported on input line 11.
$ egrep -i "warn|err|fail|miss|unknown|not known|undef|not def|ill|wrong" mwe.log
$ 

“the-o-rem” is the US-English hyphenation, whereas we should have obtained the British-English hyphenation “the-orem”.

AlMa1r commented 3 months ago

@jbezos Out of curiosity, is \language0 is sometimes not the US English?

jbezos commented 3 months ago

Never. It’s explained in language.dat:

% We must keep english as the default (first) here, and let it refer to
% hyphen.tex (not anything else), and do not change the hyphen.tex file,
% or name some other file hyphen.tex.  In other words, hyphen.tex must
% remain the original file from Knuth, and it must be \language0.
AlMa1r commented 3 months ago

Never. It’s explained in language.dat:

% We must keep english as the default (first) here, and let it refer to
% hyphen.tex (not anything else), and do not change the hyphen.tex file,
% or name some other file hyphen.tex.  In other words, hyphen.tex must
% remain the original file from Knuth, and it must be \language0.

Thanks! In this case, whenever UKenglish or british is specified as a babel option, could be technically possible to check in english.ldf whether using this language would amount to using \language0, and if so, warn the user? You wouldn't catch many other errors resulting from user's alterations of language.dat, but you'd catch this specific one (and if the user redefines \language0, he/she is himself/herself to blame, if I get your post right). Apart from TeX Live (for which the maintainer said yesterday in the mailing list that he introduced a dependency that would soon appear), there's also MikTeX and MacTEX …

jbezos commented 3 months ago

\language0 is always US English, but this doesn’t mean these hyphenation rules cannot be assigned to other languages.

AlMa1r commented 3 months ago

[…] this doesn’t mean these hyphenation rules cannot be assigned to other languages.

Is there a non-obsolete use case? In our context, an intentional assignment of the US-English hyphenation rules to the UK English? I can imagine that this was useful decades ago, when the UK-English patterns were absent, or that nerds and testers might try this out, but otherwise my imagination fails me here …