latex3 / babel

The babel system for LaTeX, LuaLaTeX and XeLaTeX
LaTeX Project Public License v1.3c
126 stars 34 forks source link

catcodes of languages loaded on-the-fly #105

Closed u-fischer closed 3 years ago

u-fischer commented 3 years ago

babel now allows to select languages in the document which haven't been preloaded. Their .ini file is then loaded in the next compilation in the aux-file after the catcodes of shorthands have been changed.

This means that these ini files are read with rather random (and changing) catcode settings.

The following example demonstrates the problem. I was trying to split the tag.bcp47 value at the hyphen (I know I could use language.tag.bcp47 but the code was supposed to work with polyglossia and "manual" settings too). And this suddenly failed when I added czech as a language:

\documentclass{article}
\usepackage[
  czech,
  french]
  {babel}
\ExplSyntaxOn
\cs_new_protected:Npn \splitlocale #1 {\seq_set_split:NnV\l_tmpa_seq {-} #1 \seq_show:N\l_tmpa_seq }
\ExplSyntaxOff
\begin{document}
\selectlanguage{brazilian} 
\getlocaleproperty\test{brazilian}{identification/tag.bcp47}
\splitlocale\test
\ExplSyntaxOn
\tl_analysis_show:N\test
\ExplSyntaxOff

\end{document}

without czech everything is fine:

The sequence \l_tmpa_seq contains the items (without outer braces):
>  {pt}
>  {BR}.

but with czech it doesn't split:

The sequence \l_tmpa_seq contains the items (without outer braces):
>  {pt-BR}.
<recently read> }

l.34 \splitlocale\test

? 
The token list \test contains the tokens:
>  p (the letter p)
>  t (the letter t)
>  - (active character=macro:->\active@prefix -\normal@char- )
>  B (the letter B)
>  R (the letter R).
<recently read> }
u-fischer commented 3 years ago

Side remark: I found this, while trying to switch from \localeinfo to \getlocaleproperty as discussed in issue #102. \localeinfo{tag.bcp47} doesn't suffer from the catcodes, the hyphen is not active with it.