Closed r0polach closed 3 years ago
Here I am attaching the intermediate result of pandoc -- in latex format
Very likely the problem is the hyphen in the \hypertarget
or in the \label
. See https://tex.stackexchange.com/questions/490069/incomplete-iffalse-all-text-was-ignored/583191#583191 . So, you must reconfigure somehow how pandoc
manages the languages. I think the following can help: https://tex.stackexchange.com/questions/505036/obtaining-babel-french-automatic-spaces-before-punctuation-when-using-pandoc/505068#505068 .
can you create a full latex document? your intermedia file is only the document body.
I could reproduce the problem, and as Javier said the problem is that czech makes the hyphen active and gives it a rather fragile and difficult definition.
The best is to add \shorthandoff{-}
to your preamble (after loading babel) or at least before such listings.
If pandoc is used with a reasonably new LaTeX (2020-10-01, ie current to me exact :-)) then you could try adding
\AddToHook{env/lstlisting/before}{\shorthandoff{-}}
\AddToHook{env/lstlisting/after}{\shorthandon{-}}
if you need the cz definition of the -
char in normal text. This way you don't have to disable it in front of every listings
Hi thanks a lot.
Only one solution is working for me -- adding this code to pandoc latex header (-H
) file:
\AtBeginDocument{\shorthandoff{-}}
I am not sure if it has some unwanted side effects yet, but all other solutions didn't work for me:
\shorthandoff{-}
nor
\AddToHook{env/lstlisting/before}{\shorthandoff{-}}
\AddToHook{env/lstlisting/after}{\shorthandon{-}}
\shorthandoff{-}
I guess cz actives it only at begin document (unconditionally) which is why you then have to also use \AtBeginDocument
to deactivate it even later
nor
\AddToHook{env/lstlisting/before}{\shorthandoff{-}} \AddToHook{env/lstlisting/after}{\shorthandon{-}}
what goes wrong? Any error messages? As I had no test document, I just wrote that on top of my head, something along those lines should work if you have a current LaTeX.
nor
\AddToHook{env/lstlisting/before}{\shorthandoff{-}} \AddToHook{env/lstlisting/after}{\shorthandon{-}}
what goes wrong? Any error messages? As I had no test document, I just wrote that on top of my head, something along those lines should work if you have a current LaTeX.
It gives the error
Error producing PDF.
! Incomplete \iffalse; all text was ignored after line 131.
<inserted text>
\fi
l.131 ON
-- so, basically the same as without a fix, but with 131
instead of 129
...
Is there a complete document with preamble anywhere for download? I don't really want to try and generate it via pandoc but I'm curious why it doesn't work.
Here it is -- only one line file, which is referenced in the command
pandoc test_input.txt -o test_output.pdf -f markdown --listings -H fix.tex.txt
...and for not-working example: fix2.tex.txt
pandoc test_input.txt -o test_output.pdf -f markdown --listings -H fix2.tex.txt
well it works for me. Run it with --verbose
and check what latex format is used. Mine says
...
[makePDF] Run #1
This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2021) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(./tex2pdf.-c33fa176037637b6/input.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-05-11>
...
somewhere in the middle and that is what you need to make \AddToHook
work. If your LaTeX is older than that can't work. And if that -H
adds its stuff where it seems to add it (i.e. before loading babel) then no surprise that you needed \AtBeginDocument
because at that point babel isn't loaded so \shorthandsoff
should even give you an error.
The -H
comes at an odd place so it looks difficult to use it to correct used packages other than by using \AtBeginDocument
..
Yes, it seem to be older:
[makePDF] Run #1
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 21.1)
entering extended mode
(C:/Users/Roman/AppData/Local/Temp/tex2pdf.-31b895c11dfa8c6e/input.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-01-09> xparse <2020-03-03>
While my -H
file should be multi-language aware, I ended up with:
\usepackage{ifthen}
\ifthenelse{\equal{\language}{czech}}{\AtBeginDocument{\shorthandoff{-}}}{}%
(Pandoc dev here.) We have code now in our default template that is intended to turn off all language-specific shorthands, so I was surprised that this \shorthandoff{-}
had an effect. Obviously, I'm not correct in thinking that our code disables all the shorthands. If any of you babel experts have advice, I'd be glad to hear it. Here's the code:
% get rid of language-specific shorthands (see #6817):
\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}
@jgm to disable shorthands use
\usepackage[shorthands=off]{babel}
We found that shorthands=off
doesn't disable everything.
See https://tex.stackexchange.com/questions/443385/babel-decimal-separator-is-missing-when-shorthands-off-is-set
https://github.com/jgm/pandoc/issues/6817
I had thought that the code above would disable everything -- but maybe we need both that and shorthands=off
?
Update: actually, it's a very frustrating situation. If we just use
% get rid of language-specific shorthands (see #6817):
\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}
then the problem of jgm/pandoc#6817 (decimal point disappearing with es
) is solved, but the problem of jgm/pandoc#7137 (listings with cs
) persists. If we just use
shorthands=off
in loading babel, then 7137 is fixed but 6817 is broken. My thought was to use both, but that doesn't work: when we use both, 6817 is still broken.
Any advice?
well the disappearing decimal point is obviously a bug, which got forgotten. So I would suggest to open a bug report. I wouldn't use the \def\languageshorthands#1{}
code, that smells like a bad hack.
well the disappearing decimal point is obviously a bug, which got forgotten. So I would suggest to open a bug report. I wouldn't use the \def\languageshorthands#1{} code, that smells like a bad hack.
Agreed, I'd be glad to be rid of it. Is this the right place to open a bug report?
And is there any way to work around these problems, solving both at the same time, with current babel?
Yes, it seem to be older:
[makePDF] Run #1 This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 21.1) entering extended mode (C:/Users/Roman/AppData/Local/Temp/tex2pdf.-31b895c11dfa8c6e/input.tex LaTeX2e <2020-10-01> patch level 4 L3 programming layer <2021-01-09> xparse <2020-03-03>
That's strange. I don't think the L3 programming layer should make a difference (unless you see earlier errors when using --verbose
). But perhaps you should first make sure that all packges in your MikTeX distribution are up-to-date and see if that helps but my bet is on our pandoc templates differ.
This is all a bit hard to debug if we don't get hold of the .log file and the full generated TeX file (it may look different from the one I get generated by pandoc (because I may not have the lastest version of pandoc as I normally don't use it).
@jgm Shorthand can be activated selectively. So instead of shorthands=off
you can say, for example, shorthands=.?!;:
(which allows, if necessary, a few characters for french
, too). As to the bug with spanish
, Ulrike is right and the bug just got forgotten, even if there is already an issue open ( https://github.com/jbezos/babel-spanish/issues/1 ).
@jbezos I want to turn off ALL shorthands, not keep some active. The problem is just that shorthands=off
doesn't seem to do that. Glad to see the bug has already been reported.
@jbezos I want to turn off ALL shorthands, not keep some active.
@jgm but that would be wrong, wouldn't it? Some language use shorthands to handle special typographical conventions, e.g. adding some extra space before a :
(not sure if french still does that) or doing special handling of hyphenation characters (isn't that what czech is doing?) and similar. In that case it is not a shorthand for easier input that, e.g. like the german "a" for
äor when it was devised for
\"{a}` but just ordinary input that gets some typographical treatment --- which you kill off by turning the shorthand off.
I was thinking on a more general question. @jgm What are you expecting from a localization system? I've taken some steps for babel
to work better with automatically generated documents, and I’d like to better understand your needs. See for example https://github.com/latex3/babel/blob/master/news-guides/news/whats-new-in-babel-3.39.md#locale-loading-on-the-fly and https://github.com/latex3/babel/blob/master/news-guides/news/whats-new-in-babel-3.43.md#autoloading-based-on-bcp-47-codes . As explained by @FrankMittelbach, switching off all shorthands can produce typographically incorrect documents (except, of course, if pandoc
does some preprocessing).
I've done a bit of analysing of the issue and I think I start to understand what is happening. In a nutshell it looks to be like this:
-
active (when czech is used)\label
containing a -
\write
as an active character without being further expanded due to the way babel defines it\write
is executed when the page is shipped out and again it remains active but otherwise unchanged because of its top-level definition-
a new definition but this definition is not safe in a \write
-
and boomThis then explains why turning off the -
in front of lstlisting
has no effect since that doesn't contain the offending -
it is inside the \write
that happened earlier
In other words
Wider question: should active chars remain active inside write or should the become normal chars there?
For example, if \protected@write
would not use \unexpandable@protect
the problem would go away, i.e.,
\long\def \protected@write#1#2#3{%
\begingroup
\let\thepage\relax
#2%
% \let\protect\@unexpandable@protect
\def\protect##1{\detokenize{##1}}%
\edef\reserved@a{\write#1{#3}}%
\reserved@a
\endgroup
\if@nobreak\ifvmode\nobreak\fi\fi
}
but I haven't thought through if there are cases where the active status would need to be retained.
Addendum: the suggested change in \protected@write
would quite work for 2 level writing, e.g. -> aux -> toc
because we have some fragile stuff that getx explicit protection and that wouldn't remain in the second step so maybe something like
\def\protect##1{\noexpand\protect\detokenize{##1}}%
is needed instead.
that hyphen thus ends in a \write as an active character without being further expanded due to the way babel defines it
Yes. For some unknown reason I thought that would happen only for a few short hands like the hyphen of czech, but I now see, that it affects all of them.
that hyphen thus ends in a \write as an active character without being further expanded due to the way babel defines it
Yes. For some unknown reason I thought that would happen only for a few short hands like the hyphen of czech, but I now see, that it affects all of them.
\protect
is essentially \noexpand
inside a write so it retains its catcode, thus my suggestion to change that as it doesn't seems necessary (after all you want is to end up in a file).
@jbezos I'm glad to see the BCP47 based loading option. Once this becomes widespread in LaTeX installations, it could allow us to simplify some code that translates between BCP47 and babel language names.
@jbezos @FrankMittelbach We've run into enough inadvertent triggering of babel shorthands that we just decided to disable them all. But you are right that turning off all shorthands will produce some incorrect typography. Perhaps shorthands=.?!;:
would be safe enough. However:
It would be problematic if the shorthands were applied to content in math mode (as we saw with the .
in es
). But, looking at the babel documentation, I see there's an option math=normal
that isn't the default. (I wonder if that would fix the issue we had with .
in es
, which was occurring in math contexts?) [EDIT: I tried adding math=normal
, but then pdflatex just hangs and I have to ^C. babel 2020/03/22 3.42]
The shorthands would also need to be deactivated in verbatim contexts. Pandoc uses a variety of different verbatim contexts:
--listings
is used)--listings
isn't used)Does babel deactivate the shorthands in these contexts? [EDIT: based on experimentation, it looks like it does.]
@FrankMittelbach I think the macro to be modified is in babel
(\active@prefix
). But I'd like to be sure.
@jgm As I'm the maintainer of the spanish
style, I'll update it in a few days (it's about time!). You may still want to configure some languages to your needs, yet.
@FrankMittelbach I think the macro to be modified is in
babel
(\active@prefix
). But I'd like to be sure.
how would you want to alter that? I don't think you can unless you enhance it so that it doesn't put an active character in, in certain situations (but that is rather fragile). What it does currently is correct. It keeps the active char but prevents expansion where it shouldn't expand.
The error is really in listings which gives the active character a new meaning without any protection so that any char sitting and waiting in a \write
will bomb out in the OR if a listing env is active when the OR starts to make the page. So it should really be corrected there. On the other hand there is no need to keep it active in \write
so my proposal was that core LaTeX takes care of that and then the problem goes away as well. However, that fix will only come with the Fall release of LaTeX not with the immediate one coming up.
Short term, for the problem here I think the answer is: do not use -
in labels if you also use listings
The error is really in listings which gives the active character a new meaning without any protection
How can such a protection be done? For example in this example the redefined " bombs, how can one protect the code in the output routine here?
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{lipsum}
\begin{document}
\section{abc}\label{abc"blub}
{
\def"{\ERROR}
\lipsum
}
\end{document}
In the same way as babel does it (or utf8 chars do it), basically through a two-level process
\MakeSureImFine -{payload}
\MakeSureImFine
checks the contexts and either does something like \protect#1
or executes #2
In babel \MakeSureImFine = \active@char
(or similarly named).
Bottom line the babel shorthand mechanism doesn't belong into babel but LaTeX should offer a standard mechanism for all packages, that can then be used by babel and or listings or ... but we know that. As long as that is not there protection of the above sort is needed by each and every package that activates chars as it will possibly conflict with babel doing that too.
In the same way as babel does it (or utf8 chars do it), basically through a two-level process
ah, I thought you had some general protection command for the OR in mind. Instead you mean that all active char definitions should be safe, and so the \def"{\ERROR}
should be changed. That makes sense.
Bottom line the babel shorthand mechanism doesn't belong into babel but LaTeX should offer a standard mechanism for all packages
Yes, that would be good.
ah, I thought you had some general protection command for the OR in mind. Instead you mean that all active char definitions should be safe, and so the
\def"{\ERROR}
should be changed. That makes sense.
The OR approach that is feasible for now (in my opinion) is to render all active chars harmless in a \write
, at least I can't think of a scenario where expansion of such characters are wanted during file write (expansion of other stuff yes). But ultimately (and regardless of that) my proposal is to provide a standard handling for all active chars in LaTeX to be used by packages, ie lifting it or a variation from babel and surround it by interfaces that allow coexistence. But this goes far beyond this bug report.
Yes, it seem to be older:
[makePDF] Run #1 This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 21.1) entering extended mode (C:/Users/Roman/AppData/Local/Temp/tex2pdf.-31b895c11dfa8c6e/input.tex LaTeX2e <2020-10-01> patch level 4 L3 programming layer <2021-01-09> xparse <2020-03-03>
That's strange. I don't think the L3 programming layer should make a difference (unless you see earlier errors when using
--verbose
). But perhaps you should first make sure that all packges in your MikTeX distribution are up-to-date and see if that helps but my bet is on our pandoc templates differ.This is all a bit hard to debug if we don't get hold of the .log file and the full generated TeX file (it may look different from the one I get generated by pandoc (because I may not have the lastest version of pandoc as I normally don't use it).
There is no error before. Attaching full log (with stripped some environment variables)... pandoc_verbose_eoutlog.txt
I don't know how to make sure MikTeX packages are up-to-date...
Some TeX code (more complete than intermediate I can get as pandoc output) is included in the log. I do not know how to get some even more complete otherwise.
I don't know how to make sure MikTeX packages are up-to-date...
Your system is fine. The \AddToHook idea isn't the right solution, so we are discussing other options now. Use \AtBeginDocument{\shorthandoff{-}}
for now.
There is no error before. Attaching full log (with stripped some environment variables)... pandoc_verbose_eoutlog.txt
I don't know how to make sure MikTeX packages are up-to-date...
Some TeX code (more complete than intermediate I can get as pandoc output) is included in the log. I do not know how to get some even more complete otherwise.
as explained in the later analysis above, the problem at your end really comes from the use of -
in the heading \label
and its interferecne with the listing environment at page break. If you (can) avoid such hyphens there the problem should vanish, or if you disable the shorthand of -
altogether.
@FrankMittelbach For some reason, after reading \active@prefix
I was under the impression there was something wrong, but clearly it's not actually the case.
Bottom line the babel shorthand mechanism doesn't belong into babel but LaTeX should offer a standard mechanism for all packages
Definitely.
@jgm The bug in spanish
is now fixed.
I'm closing this issue because the bug in spanish
has been fixed and it is mainly related to the pandoc
configuration.
I do not understand tech details, but is the problem fixed
also for lang: "cs"
(czech language) as originally reported?
Doesn't adding \shorthandoff{-}
to the pandoc
template work for you (as suggested in https://github.com/latex3/babel/issues/132#issuecomment-845278409).
I used solution https://github.com/latex3/babel/issues/132#issuecomment-845982924 but according to https://github.com/latex3/babel/issues/132#issuecomment-846865676 I understand this is only temporary workaround?
The current behavior isn’t incorrect (maybe questionable) and https://github.com/latex3/babel/issues/132#issuecomment-846874930 applies. Of course, like any program, its behavior might change in the future, although I don't think it will happen anytime soon.
Hi, I am not sure, if this is the right place to fill this report, but report at https://github.com/jgm/pandoc/issues/7137 was commented, that the issue is in babel so it was rejected as pandoc issue report.
Please note, that I do not know much about latex, babel, etc., I am just a user o pandoc and the example of this issue comes with the test file with markdown syntax text.
So here is the description:
pandoc 2.11.4 for windows (x86_64), miktex 21.1 (x64)
When I run
pandoc test_input.txt -o test_output.pdf -f markdown --listings
on the attached file test_input.txt, I got following error:test_input.txt
It is caused by combination of two facts:
lang
iscs
.If I remove
lang: "cs"
or if the text flow does not push the listing on the edge of the page, then everything would be ok.