Biblatex 3.14 issue with \mknormrange & \mkcomprange in postnotes

dfussner commented 4 years ago

When you use babel and the standard LaTeX engine (TeXLive 2020), if the "french" option is given, either as main or as a secondary language, then biblatex truncates a postnote field that contains a semicolon dividing two ranges. Biber seems to do the right thing with the same data in a pages field.

Here's a MWE:

\documentclass[12pt]{article}
\usepackage{csquotes}
\usepackage[french,american]{babel}
\usepackage[style=verbose-inote,backend=biber]{biblatex}
\begin{filecontents}[overwrite]{\jobname.bib}
@book{knuth:ct:a,
  author       = {Knuth, Donald E.},
  title        = {The {\TeX} book},
  date         = {1984},
  maintitle    = {Computers \& Typesetting},
  volume       = {A},
  pages        = {132--148; 156--165},
  publisher    = {Addison-Wesley},
  location     = {Reading, Mass.}
}
\end{filecontents}
\addbibresource{\jobname.bib}
\begin{document}
\thispagestyle{empty}

\noindent\cite[132--148; 156--165]{knuth:ct:a}.

\printbibliography
\end{document}

And here's the output on my system: mwe9

The start of the problem seems to be \initiate@active@char{;} in french.ldf. The Breton ldf file does the same thing with the same results. If you comment out that line and run LaTeX again the problem disappears; \bbl@deactivate{;} doesn't make a difference, or perhaps I'm not using it correctly.

With \tracingmacros=2, in my log file I get this notice when ; is activated:

\blx@range@chunk@semcol #1;#2&->\notblank {#1} {\blx@range@chunk@comma #1,&} {}
\notblank {#2} {\notblank {#1}{\blx@range@out@delim {\bibrangessep }}{}\blx@ran
ge@chunk@semcol #2&} {}
#1<-132--148; 156--165
#2<-

When ; has never been activated:

\blx@range@chunk@semcol #1;#2&->\notblank {#1} {\blx@range@chunk@comma #1,&} {}
\notblank {#2} {\notblank {#1}{\blx@range@out@delim {\bibrangessep }}{}\blx@ran
ge@chunk@semcol #2&} {}
#1<-132--148
#2<- 156--165;

The delimited parameter list in \blx@range@chunk@semcol is thrown by the activated {;}, I guess, and further along it results in the loss of some of the field data. What works is to use \edef instead of \def when defining \abx@field@postnote in biblatex.sty, but then something as simple as \textbf in a postnote field needs to have \noexpand. So, in \long\def\blx@defcitecmd@v I tried a test like:

\ifboolexpr{%
   test {\ifpages{##2}}%    
   and  
   test {\ifnumequal{\catcode`;}{13}}%
 }%
  {\edef\abx@field@postnote{##2}}%
  {\def\abx@field@postnote{##2}}%

in place of:

\def\abx@field@postnote{##2}

This seems to work fine, but I haven't tested it to destruction or anything. It's probable that something using \scantokens elsewhere in the code would be safer and more elegant, but I couldn't get it right. Any solution in this location should also be included in \citename, \citelist, and \citefield, as well as (possibly?) in \blx@defvolcitepostnote.

Using polyglossia and XeLaTeX works fine, and I would guess babel with the same engine (or LuaTeX) would also work fine, as active characters are unnecessary there, if I'm reading the code correctly.

I hope my attempts at diagnosis and cure are some help, and many thanks.

moewew commented 4 years ago

Thank you very much for reporting this issue and also for the thorough investigation.

This catcode business reminds me of the terrible things I did in https://github.com/plk/biblatex/commit/9ff2cd0eed4591b449043c426f9d4e77e81321f8 to try and fix option processing under a different catcode regime

https://github.com/plk/biblatex/blob/7dd553f9c2b9a4474ed03f63266abbd1949d2041/tex/latex/biblatex/biblatex.sty#L7812-L7834

Here I'm actually inclined to include the catcode test somehwere in \blx@range@chunk@semcol and just give up if ; does not have the expected catcode. Retokenizing arbitrary text always feels risky to me.

dfussner commented 4 years ago

I agree -- almost anything can and does appear in a postnote, and working around different catcode regimes is, as your code proves, unpleasant at best. Your solution would at least keep the data intact, and there there are several workarounds for users writing in French (or Breton):

Use LuaTeX or xeTeX.
Use commas in the postnote field, and set \bibrangessep to whatever value you need.
Use \bibrangessep in the postnote field, and set it to whatever value you need.

Light testing suggests all of these work fine.

moewew commented 4 years ago

Turns out this is trickier than I thought. If we can't split at ; then the code that comes later to normalise the range will not work as expected and just drop stuff. It would be a bit much to completely disable the whole feature when ; has the wrong catcode, because then even people who never use it at all will not get the desired macro behaviour.

etoolbox's \DeclareListParser does not manage to split at the ; if it has the wrong catcode. But expl3 can do it, apparently

If only a single character <token> is used for the split, any category code 13 (active) character matching the <token> will be replaced before the split takes place. Spaces are trimmed at each end of each item parsed.

\documentclass[french]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{xparse}

\NewDocumentCommand \foo { >{\SplitList{;}} m } { \ProcessList {#1} { \foobar } }

\begin{document}
\newcommand*{\foobar}[1]{A#1B}
\foo{1;2;3 ; 4 ; 5; 6}

\foo{1}
\end{document}

dfussner commented 4 years ago

I have the impression, perhaps mistaken, that you have tried to avoid requiring xparse when developing biblatex (?) It's a big hammer for a small nut, but I've been exploring Stack Exchange and CTAN and haven't found anything that you hadn't already considered and rejected. It looks like \SplitList would actually do the trick, but I'm sorry to admit that I haven't tried coding it up to prove it ...

plk commented 4 years ago

It's inevitable - I have heavily used xparse in the multiscript branch and there is no way around this I can see without horribly complicating things.

josephwright commented 4 years ago

@plk The @latex3 team will be loading xparse (or essentially all of it) as part of the LaTeX2e format from the autumn: I really would not worry overly.

moewew commented 4 years ago

The new case changing code (#1005) will use expl3 and xparse (and as PLK mentioned, the multiscript proof-of-concept also makes heavy use of xparse), so I think at some point we are going to move more stuff to LaTeX3. At the moment I'm trying to avoid expl3 if possible and try to separate expl3 code out, but we may find that this is not an option any more and will move more and more stuff from LaTeX2 to expl3.

What I wouldn't find too great is if we end up with a weird mixture of all sorts of languages (LaTeX2e, expl3) and coding styles in the biblatex core...

plk / biblatex

Biblatex 3.14 issue with \mknormrange & \mkcomprange in postnotes #1013