Closed AlMa1r closed 3 months ago
I’ll reformulate the issue in more general terms:
Detect a misconfigured language.dat
The short answer is, language.dat
may be modified locally and babel
cannot know what the user wants.
This message does not necessarily mean something got wrong. For example, with the following ‘info’ the setting is (very likely) quite correct:
Package babel Info: Hyphen rules for 'australian' set to \l@ukenglish
(babel) (\language23). Reported on input line 105.
How hyphenation patterns are assigned is dependent not only on what the corresponding ldf
does, but also on the local configuration with language.dat
, which should provide by default some sensible settings. But since language.dat
may be changed locally, babel
cannot know if the result is the intended one. So, a noisy warning can be, in fact, misleading and cumbersome. It’s up to the user to decide if things are right or wrong.
IMO, here the real problem is this one:
It turned out that babel-english was installed but hyphen-english was not
This is a dependency which has to be properly managed by the installer.
@jbezos Thank you for looking into this. Following your “which has to be properly managed by the installer”, I submitted a report to the TeX-Live list at tug.
@AlMa1r Anyway, I've added it to the list of enhancement requests.
@AlMa1r Anyway, I've added it to the list of enhancement requests.
My post in the tex-live list at TUG was technically mis-formatted. Here it is in good formatting:
When babel-english gets installed, but hyphen-english has not yet been installed and doesn't get installed, we don't get any warning. We also don't get any warning later when compiling a LaTeX document with intended British hyphenation and getting a US hyphenation instead. Therefore, we kindly ask to explicitly warn the user (or to force installing the British hyphenation when babel-english gets installed). The babel maintainer said in https://github.com/latex3/babel/issues/290#issuecomment-2023121848 that it's the task of the installer. Our test:
$ tlmgr show babel-english hyphen-english | grep installed
installed: No
installed: No
$ tlmgr install babel-english
tlmgr: package repository https://ctan.space-pro.be/tex-archive/systems/texlive/tlnet (verified)
[1/1, ??:??/??:??] install: babel-english [137k]
running mktexlsr ...
done running mktexlsr.
tlmgr: package log updated: /home/username/usr/local/texlive/2024/texmf-var/web2c/tlmgr.log
tlmgr: command log updated: /home/username/usr/local/texlive/2024/texmf-var/web2c/tlmgr-commands.log
$ cat > mwe.tex
\documentclass[british]{article}
\usepackage[british]{babel}
\begin{document}
\showhyphens{theorem theorems}
\end{document}
$ latex mwe
This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024) (preloaded format=latex)
restricted \write18 enabled.
entering extended mode
(./mwe.tex
LaTeX2e <2023-11-01> patch level 1
L3 programming layer <2024-03-14>
(/home/username/usr/local/texlive/2024/texmf-dist/tex/latex/base/article.cls
Document Class: article 2023/05/17 v1.4n Standard LaTeX document class
(/home/username/usr/local/texlive/2024/texmf-dist/tex/latex/base/size10.clo))
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/babel.sty
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/txtbabel.de
f)
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/bri
tish.ldf
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/eng
lish.ldf)))
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/locale/en/b
abel-british.tex)
(/home/username/usr/local/texlive/2024/texmf-dist/tex/latex/l3backend/l3backend
-dvips.def)
No file mwe.aux.
Underfull \hbox (badness 10000) in paragraph at lines 4--4
[] \OT1/cmr/m/n/10 the-o-rem the-o-rems
(./mwe.aux) )
(see the transcript file for additional information)
No pages of output.
Transcript written on mwe.log.
$ grep babel mwe.log
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/babel.sty
Package: babel 2024/02/07 v24.2 The Babel package
\babel@savecnt=\count196
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/txtbabel.de
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/bri
Language: british 2017/06/06 v3.3r English support from the babel system
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel-english/eng
Language: english 2017/06/06 v3.3r English support from the babel system
Package babel Info: Hyphen rules for 'british' set to \l@english
(babel) (\language0). Reported on input line 82.
Package babel Info: Hyphen rules for 'UKenglish' set to \l@english
(babel) (\language0). Reported on input line 83.
Package babel Info: Hyphen rules for 'canadian' set to \l@english
(babel) (\language0). Reported on input line 102.
Package babel Info: Hyphen rules for 'australian' set to \l@english
(babel) (\language0). Reported on input line 105.
Package babel Info: Hyphen rules for 'newzealand' set to \l@english
(babel) (\language0). Reported on input line 108.
(/home/username/usr/local/texlive/2024/texmf-dist/tex/generic/babel/locale/en/b
Package babel Info: Importing font and identification data for british
(babel) from babel-en-GB.ini. Reported on input line 11.
$ egrep -i "warn|err|fail|miss|unknown|not known|undef|not def|ill|wrong" mwe.log
$
“the-o-rem” is the US-English hyphenation, whereas we should have obtained the British-English hyphenation “the-orem”.
@jbezos Out of curiosity, is \language0
is sometimes not the US English?
Never. It’s explained in language.dat
:
% We must keep english as the default (first) here, and let it refer to
% hyphen.tex (not anything else), and do not change the hyphen.tex file,
% or name some other file hyphen.tex. In other words, hyphen.tex must
% remain the original file from Knuth, and it must be \language0.
Never. It’s explained in
language.dat
:% We must keep english as the default (first) here, and let it refer to % hyphen.tex (not anything else), and do not change the hyphen.tex file, % or name some other file hyphen.tex. In other words, hyphen.tex must % remain the original file from Knuth, and it must be \language0.
Thanks! In this case, whenever UKenglish
or british
is specified as a babel option, could be technically possible to check in english.ldf whether using this language would amount to using \language0
, and if so, warn the user? You wouldn't catch many other errors resulting from user's alterations of language.dat, but you'd catch this specific one (and if the user redefines \language0
, he/she is himself/herself to blame, if I get your post right). Apart from TeX Live (for which the maintainer said yesterday in the mailing list that he introduced a dependency that would soon appear), there's also MikTeX and MacTEX …
\language0
is always US English, but this doesn’t mean these hyphenation rules cannot be assigned to other languages.
[…] this doesn’t mean these hyphenation rules cannot be assigned to other languages.
Is there a non-obsolete use case? In our context, an intentional assignment of the US-English hyphenation rules to the UK English? I can imagine that this was useful decades ago, when the UK-English patterns were absent, or that nerds and testers might try this out, but otherwise my imagination fails me here …
Running
latex
from TeX Live 2024 onyields
on the console, which is the US-English hyphenation according to http://www.merriam-webster.com/dictionary/theorem . The actual UK-English hyphenation of the singular noun “theorem” is, to the best of my knowledge, “the-orem” (don't ask me about the plural) according to The Oxford spelling dictionary, Robert Edward Allen, 1986, p. 264 , New Oxford spelling dictionary, Maurice Waite, 2005, 3rd edition, p. 526 , and Oxford advanced learner's dictionary of current English, Albert Sydney Hornby, 2015, 9th edition, p. 1567 .
Looking into the log, we discover
It turned out that babel-english was installed but hyphen-english was not, and there was no good warning or error about this in the log or on the console beyond the aforementioned mappings to
\language0
. This information about the US-English hyphenation forbritish
could go easily unnoticed in huge logs, and the visual outputs of American English and British English are often too close (so that if only few UK-English words are scattered in a huge foreign-language text, we might not even notice the wrong hyphenation). Could we get at least aWarning
(orerror
,undefined
,missing
,failed
,unknown
, …, (capitalized if necessary)) in the log or on the console if the British hyphenation is requested by the LaTeX document but fails or is likely to fail in some way (here, fallbacks to default)?