latex3 / babel

The multilingual framework to localize LaTeX, LuaLaTeX and XeLaTeX
https://latex3.github.io/babel/
LaTeX Project Public License v1.3c
130 stars 35 forks source link

Error when listing is on a page break and lang is cs #132

Closed r0polach closed 3 years ago

r0polach commented 3 years ago

Hi, I am not sure, if this is the right place to fill this report, but report at https://github.com/jgm/pandoc/issues/7137 was commented, that the issue is in babel so it was rejected as pandoc issue report.

Please note, that I do not know much about latex, babel, etc., I am just a user o pandoc and the example of this issue comes with the test file with markdown syntax text.


So here is the description:

pandoc 2.11.4 for windows (x86_64), miktex 21.1 (x64)

When I run pandoc test_input.txt -o test_output.pdf -f markdown --listings on the attached file test_input.txt, I got following error:

Error producing PDF.
! Incomplete \iffalse; all text was ignored after line 129.
<inserted text>
                \fi
l.129 ON

test_input.txt

It is caused by combination of two facts:

If I remove lang: "cs" or if the text flow does not push the listing on the edge of the page, then everything would be ok.

r0polach commented 3 years ago

Here I am attaching the intermediate result of pandoc -- in latex format

test_intermediate.tex.txt

jbezos commented 3 years ago

Very likely the problem is the hyphen in the \hypertarget or in the \label. See https://tex.stackexchange.com/questions/490069/incomplete-iffalse-all-text-was-ignored/583191#583191 . So, you must reconfigure somehow how pandoc manages the languages. I think the following can help: https://tex.stackexchange.com/questions/505036/obtaining-babel-french-automatic-spaces-before-punctuation-when-using-pandoc/505068#505068 .

u-fischer commented 3 years ago

can you create a full latex document? your intermedia file is only the document body.

u-fischer commented 3 years ago

I could reproduce the problem, and as Javier said the problem is that czech makes the hyphen active and gives it a rather fragile and difficult definition.

The best is to add \shorthandoff{-} to your preamble (after loading babel) or at least before such listings.

FrankMittelbach commented 3 years ago

If pandoc is used with a reasonably new LaTeX (2020-10-01, ie current to me exact :-)) then you could try adding

\AddToHook{env/lstlisting/before}{\shorthandoff{-}}
\AddToHook{env/lstlisting/after}{\shorthandon{-}}

if you need the cz definition of the - char in normal text. This way you don't have to disable it in front of every listings

r0polach commented 3 years ago

Hi thanks a lot.

Only one solution is working for me -- adding this code to pandoc latex header (-H) file:

\AtBeginDocument{\shorthandoff{-}}

I am not sure if it has some unwanted side effects yet, but all other solutions didn't work for me:

\shorthandoff{-}

nor

\AddToHook{env/lstlisting/before}{\shorthandoff{-}}
\AddToHook{env/lstlisting/after}{\shorthandon{-}}
FrankMittelbach commented 3 years ago
\shorthandoff{-}

I guess cz actives it only at begin document (unconditionally) which is why you then have to also use \AtBeginDocumentto deactivate it even later

nor

\AddToHook{env/lstlisting/before}{\shorthandoff{-}}
\AddToHook{env/lstlisting/after}{\shorthandon{-}}

what goes wrong? Any error messages? As I had no test document, I just wrote that on top of my head, something along those lines should work if you have a current LaTeX.

r0polach commented 3 years ago

nor

\AddToHook{env/lstlisting/before}{\shorthandoff{-}}
\AddToHook{env/lstlisting/after}{\shorthandon{-}}

what goes wrong? Any error messages? As I had no test document, I just wrote that on top of my head, something along those lines should work if you have a current LaTeX.

It gives the error

Error producing PDF.
! Incomplete \iffalse; all text was ignored after line 131.
<inserted text>
                \fi
l.131 ON

-- so, basically the same as without a fix, but with 131 instead of 129...

FrankMittelbach commented 3 years ago

Is there a complete document with preamble anywhere for download? I don't really want to try and generate it via pandoc but I'm curious why it doesn't work.

r0polach commented 3 years ago

fix.tex.txt

Here it is -- only one line file, which is referenced in the command

pandoc test_input.txt -o test_output.pdf -f markdown --listings -H fix.tex.txt
r0polach commented 3 years ago

...and for not-working example: fix2.tex.txt

pandoc test_input.txt -o test_output.pdf -f markdown --listings -H fix2.tex.txt
FrankMittelbach commented 3 years ago

well it works for me. Run it with --verbose and check what latex format is used. Mine says

...
[makePDF] Run #1
This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2021) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./tex2pdf.-c33fa176037637b6/input.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-05-11>
...

somewhere in the middle and that is what you need to make \AddToHook work. If your LaTeX is older than that can't work. And if that -H adds its stuff where it seems to add it (i.e. before loading babel) then no surprise that you needed \AtBeginDocument because at that point babel isn't loaded so \shorthandsoff should even give you an error.

The -H comes at an odd place so it looks difficult to use it to correct used packages other than by using \AtBeginDocument..

r0polach commented 3 years ago

Yes, it seem to be older:

[makePDF] Run #1
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 21.1)
entering extended mode
(C:/Users/Roman/AppData/Local/Temp/tex2pdf.-31b895c11dfa8c6e/input.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-01-09> xparse <2020-03-03>
r0polach commented 3 years ago

While my -H file should be multi-language aware, I ended up with:

\usepackage{ifthen}
\ifthenelse{\equal{\language}{czech}}{\AtBeginDocument{\shorthandoff{-}}}{}%
jgm commented 3 years ago

(Pandoc dev here.) We have code now in our default template that is intended to turn off all language-specific shorthands, so I was surprised that this \shorthandoff{-} had an effect. Obviously, I'm not correct in thinking that our code disables all the shorthands. If any of you babel experts have advice, I'd be glad to hear it. Here's the code:

% get rid of language-specific shorthands (see #6817):
\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}
u-fischer commented 3 years ago

@jgm to disable shorthands use

           \usepackage[shorthands=off]{babel}
jgm commented 3 years ago

We found that shorthands=off doesn't disable everything. See https://tex.stackexchange.com/questions/443385/babel-decimal-separator-is-missing-when-shorthands-off-is-set https://github.com/jgm/pandoc/issues/6817 I had thought that the code above would disable everything -- but maybe we need both that and shorthands=off?

jgm commented 3 years ago

Update: actually, it's a very frustrating situation. If we just use

% get rid of language-specific shorthands (see #6817):
\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}

then the problem of jgm/pandoc#6817 (decimal point disappearing with es) is solved, but the problem of jgm/pandoc#7137 (listings with cs) persists. If we just use

shorthands=off

in loading babel, then 7137 is fixed but 6817 is broken. My thought was to use both, but that doesn't work: when we use both, 6817 is still broken.

Any advice?

u-fischer commented 3 years ago

well the disappearing decimal point is obviously a bug, which got forgotten. So I would suggest to open a bug report. I wouldn't use the \def\languageshorthands#1{} code, that smells like a bad hack.

jgm commented 3 years ago

well the disappearing decimal point is obviously a bug, which got forgotten. So I would suggest to open a bug report. I wouldn't use the \def\languageshorthands#1{} code, that smells like a bad hack.

Agreed, I'd be glad to be rid of it. Is this the right place to open a bug report?

jgm commented 3 years ago

And is there any way to work around these problems, solving both at the same time, with current babel?

FrankMittelbach commented 3 years ago

Yes, it seem to be older:

[makePDF] Run #1
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 21.1)
entering extended mode
(C:/Users/Roman/AppData/Local/Temp/tex2pdf.-31b895c11dfa8c6e/input.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-01-09> xparse <2020-03-03>

That's strange. I don't think the L3 programming layer should make a difference (unless you see earlier errors when using --verbose). But perhaps you should first make sure that all packges in your MikTeX distribution are up-to-date and see if that helps but my bet is on our pandoc templates differ.

This is all a bit hard to debug if we don't get hold of the .log file and the full generated TeX file (it may look different from the one I get generated by pandoc (because I may not have the lastest version of pandoc as I normally don't use it).

jbezos commented 3 years ago

@jgm Shorthand can be activated selectively. So instead of shorthands=off you can say, for example, shorthands=.?!;: (which allows, if necessary, a few characters for french, too). As to the bug with spanish, Ulrike is right and the bug just got forgotten, even if there is already an issue open ( https://github.com/jbezos/babel-spanish/issues/1 ).

jgm commented 3 years ago

@jbezos I want to turn off ALL shorthands, not keep some active. The problem is just that shorthands=off doesn't seem to do that. Glad to see the bug has already been reported.

FrankMittelbach commented 3 years ago

@jbezos I want to turn off ALL shorthands, not keep some active.

@jgm but that would be wrong, wouldn't it? Some language use shorthands to handle special typographical conventions, e.g. adding some extra space before a : (not sure if french still does that) or doing special handling of hyphenation characters (isn't that what czech is doing?) and similar. In that case it is not a shorthand for easier input that, e.g. like the german "a" foräor when it was devised for\"{a}` but just ordinary input that gets some typographical treatment --- which you kill off by turning the shorthand off.

jbezos commented 3 years ago

I was thinking on a more general question. @jgm What are you expecting from a localization system? I've taken some steps for babel to work better with automatically generated documents, and I’d like to better understand your needs. See for example https://github.com/latex3/babel/blob/master/news-guides/news/whats-new-in-babel-3.39.md#locale-loading-on-the-fly and https://github.com/latex3/babel/blob/master/news-guides/news/whats-new-in-babel-3.43.md#autoloading-based-on-bcp-47-codes . As explained by @FrankMittelbach, switching off all shorthands can produce typographically incorrect documents (except, of course, if pandoc does some preprocessing).

FrankMittelbach commented 3 years ago

I've done a bit of analysing of the issue and I think I start to understand what is happening. In a nutshell it looks to be like this:

This then explains why turning off the - in front of lstlisting has no effect since that doesn't contain the offending - it is inside the \write that happened earlier

In other words

Wider question: should active chars remain active inside write or should the become normal chars there?

For example, if \protected@write would not use \unexpandable@protect the problem would go away, i.e.,

\long\def \protected@write#1#2#3{%
      \begingroup
       \let\thepage\relax
       #2%
%       \let\protect\@unexpandable@protect
       \def\protect##1{\detokenize{##1}}%
       \edef\reserved@a{\write#1{#3}}%
       \reserved@a
      \endgroup
      \if@nobreak\ifvmode\nobreak\fi\fi
}

but I haven't thought through if there are cases where the active status would need to be retained.

FrankMittelbach commented 3 years ago

Addendum: the suggested change in \protected@write would quite work for 2 level writing, e.g. -> aux -> tocbecause we have some fragile stuff that getx explicit protection and that wouldn't remain in the second step so maybe something like

       \def\protect##1{\noexpand\protect\detokenize{##1}}%

is needed instead.

u-fischer commented 3 years ago

that hyphen thus ends in a \write as an active character without being further expanded due to the way babel defines it

Yes. For some unknown reason I thought that would happen only for a few short hands like the hyphen of czech, but I now see, that it affects all of them.

FrankMittelbach commented 3 years ago

that hyphen thus ends in a \write as an active character without being further expanded due to the way babel defines it

Yes. For some unknown reason I thought that would happen only for a few short hands like the hyphen of czech, but I now see, that it affects all of them.

\protect is essentially \noexpand inside a write so it retains its catcode, thus my suggestion to change that as it doesn't seems necessary (after all you want is to end up in a file).

jgm commented 3 years ago

@jbezos I'm glad to see the BCP47 based loading option. Once this becomes widespread in LaTeX installations, it could allow us to simplify some code that translates between BCP47 and babel language names.

@jbezos @FrankMittelbach We've run into enough inadvertent triggering of babel shorthands that we just decided to disable them all. But you are right that turning off all shorthands will produce some incorrect typography. Perhaps shorthands=.?!;: would be safe enough. However:

  1. It would be problematic if the shorthands were applied to content in math mode (as we saw with the . in es). But, looking at the babel documentation, I see there's an option math=normal that isn't the default. (I wonder if that would fix the issue we had with . in es, which was occurring in math contexts?) [EDIT: I tried adding math=normal, but then pdflatex just hangs and I have to ^C. babel 2020/03/22 3.42]

  2. The shorthands would also need to be deactivated in verbatim contexts. Pandoc uses a variety of different verbatim contexts:

    • listings (if --listings is used)
    • fancyvrb (for highlighted code, if --listings isn't used)
    • regular verbatim (for non-highlighted code blocks)
    • texttt (for non-highlighted inline code)

    Does babel deactivate the shorthands in these contexts? [EDIT: based on experimentation, it looks like it does.]

jbezos commented 3 years ago

@FrankMittelbach I think the macro to be modified is in babel (\active@prefix). But I'd like to be sure.

jbezos commented 3 years ago

@jgm As I'm the maintainer of the spanish style, I'll update it in a few days (it's about time!). You may still want to configure some languages to your needs, yet.

FrankMittelbach commented 3 years ago

@FrankMittelbach I think the macro to be modified is in babel (\active@prefix). But I'd like to be sure.

how would you want to alter that? I don't think you can unless you enhance it so that it doesn't put an active character in, in certain situations (but that is rather fragile). What it does currently is correct. It keeps the active char but prevents expansion where it shouldn't expand.

The error is really in listings which gives the active character a new meaning without any protection so that any char sitting and waiting in a \write will bomb out in the OR if a listing env is active when the OR starts to make the page. So it should really be corrected there. On the other hand there is no need to keep it active in \write so my proposal was that core LaTeX takes care of that and then the problem goes away as well. However, that fix will only come with the Fall release of LaTeX not with the immediate one coming up.

Short term, for the problem here I think the answer is: do not use - in labels if you also use listings

u-fischer commented 3 years ago

The error is really in listings which gives the active character a new meaning without any protection

How can such a protection be done? For example in this example the redefined " bombs, how can one protect the code in the output routine here?

\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{lipsum}
\begin{document}

\section{abc}\label{abc"blub}

{
\def"{\ERROR}

\lipsum

}

\end{document}
FrankMittelbach commented 3 years ago

In the same way as babel does it (or utf8 chars do it), basically through a two-level process

In babel \MakeSureImFine = \active@char(or similarly named).

Bottom line the babel shorthand mechanism doesn't belong into babel but LaTeX should offer a standard mechanism for all packages, that can then be used by babel and or listings or ... but we know that. As long as that is not there protection of the above sort is needed by each and every package that activates chars as it will possibly conflict with babel doing that too.

u-fischer commented 3 years ago

In the same way as babel does it (or utf8 chars do it), basically through a two-level process

ah, I thought you had some general protection command for the OR in mind. Instead you mean that all active char definitions should be safe, and so the \def"{\ERROR} should be changed. That makes sense.

Bottom line the babel shorthand mechanism doesn't belong into babel but LaTeX should offer a standard mechanism for all packages

Yes, that would be good.

FrankMittelbach commented 3 years ago

ah, I thought you had some general protection command for the OR in mind. Instead you mean that all active char definitions should be safe, and so the \def"{\ERROR} should be changed. That makes sense.

The OR approach that is feasible for now (in my opinion) is to render all active chars harmless in a \write, at least I can't think of a scenario where expansion of such characters are wanted during file write (expansion of other stuff yes). But ultimately (and regardless of that) my proposal is to provide a standard handling for all active chars in LaTeX to be used by packages, ie lifting it or a variation from babel and surround it by interfaces that allow coexistence. But this goes far beyond this bug report.

r0polach commented 3 years ago

Yes, it seem to be older:

[makePDF] Run #1
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 21.1)
entering extended mode
(C:/Users/Roman/AppData/Local/Temp/tex2pdf.-31b895c11dfa8c6e/input.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-01-09> xparse <2020-03-03>

That's strange. I don't think the L3 programming layer should make a difference (unless you see earlier errors when using --verbose). But perhaps you should first make sure that all packges in your MikTeX distribution are up-to-date and see if that helps but my bet is on our pandoc templates differ.

This is all a bit hard to debug if we don't get hold of the .log file and the full generated TeX file (it may look different from the one I get generated by pandoc (because I may not have the lastest version of pandoc as I normally don't use it).

There is no error before. Attaching full log (with stripped some environment variables)... pandoc_verbose_eoutlog.txt

I don't know how to make sure MikTeX packages are up-to-date...

Some TeX code (more complete than intermediate I can get as pandoc output) is included in the log. I do not know how to get some even more complete otherwise.

u-fischer commented 3 years ago

I don't know how to make sure MikTeX packages are up-to-date...

Your system is fine. The \AddToHook idea isn't the right solution, so we are discussing other options now. Use \AtBeginDocument{\shorthandoff{-}} for now.

FrankMittelbach commented 3 years ago

There is no error before. Attaching full log (with stripped some environment variables)... pandoc_verbose_eoutlog.txt

I don't know how to make sure MikTeX packages are up-to-date...

Some TeX code (more complete than intermediate I can get as pandoc output) is included in the log. I do not know how to get some even more complete otherwise.

as explained in the later analysis above, the problem at your end really comes from the use of - in the heading \label and its interferecne with the listing environment at page break. If you (can) avoid such hyphens there the problem should vanish, or if you disable the shorthand of - altogether.

jbezos commented 3 years ago

@FrankMittelbach For some reason, after reading \active@prefix I was under the impression there was something wrong, but clearly it's not actually the case.

Bottom line the babel shorthand mechanism doesn't belong into babel but LaTeX should offer a standard mechanism for all packages

Definitely.

jbezos commented 3 years ago

@jgm The bug in spanish is now fixed.

jbezos commented 3 years ago

I'm closing this issue because the bug in spanish has been fixed and it is mainly related to the pandoc configuration.

r0polach commented 3 years ago

I do not understand tech details, but is the problem fixed also for lang: "cs" (czech language) as originally reported?

jbezos commented 3 years ago

Doesn't adding \shorthandoff{-} to the pandoc template work for you (as suggested in https://github.com/latex3/babel/issues/132#issuecomment-845278409).

r0polach commented 3 years ago

I used solution https://github.com/latex3/babel/issues/132#issuecomment-845982924 but according to https://github.com/latex3/babel/issues/132#issuecomment-846865676 I understand this is only temporary workaround?

jbezos commented 3 years ago

The current behavior isn’t incorrect (maybe questionable) and https://github.com/latex3/babel/issues/132#issuecomment-846874930 applies. Of course, like any program, its behavior might change in the future, although I don't think it will happen anytime soon.