jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.22k stars 3.36k forks source link

Handle \def better in LaTeX #2888

Closed ickc closed 7 years ago

ickc commented 8 years ago

This problem involve the following LaTeX macros, which is put in the markdown source (I'm going to attached them below as files too):

LaTeX macros in markdown source

\def\BDpos{}
\def\BDneg{-}
\def\BDplus{+}
\def\BDminus{-}
\def\thetasigmamuthetadagger{\theta\sigma^\mu\theta^\dagger}
\def\thetasigmamuloweredthetadagger{\theta\sigma_\mu\theta^\dagger}
\newcommand{\dagg}[1]{#1^\dagger}
\newcommand{\smallnegspacedagger}{\hspace{-0.1pt}}
\newcommand{\thdthd}{\theta^\dagger\hspace{-1pt}\theta^\dagger}
\newcommand{\nablasubmu}{\nabla\hspace{-2pt}{}_\mu}
\def\beq{\begin{align}}
\def\eeq{\end{align}}
\def\bea{\begin{align*}}
\def\eea{\end{align*}}
\def\Baryon{{\rm B}}
\def\Lepton{{\rm L}}
\def\sbar{\overline}
\def\stilde{\widetilde}
\def\sst{\scriptscriptstyle}
\def\vac{|0\rangle}
\def\antivac{\langle 0|}
\def\G{\stilde G}
\def\Wmess{W_{\rm mess}}
\def\NI{\stilde N_1}
\def\nmess{N_5}
\def\lagr{{\cal L}}
\def\drbar{\overline{\rm DR}}
\def\msbar{\overline{\rm MS}}
\def\conj{{{\rm c.c.}}}
\def\Et{{\slashchar{E}_T}}
\def\Etot{{\slashchar{E}}}
\def\MPlanck{M_{\rm P}}
\def\cbeta{c_{\beta}}
\def\sbeta{s_{\beta}}
\def\cW{c_{W}}
\def\sW{s_{W}}
\def\deltaeps{\delta}
\def\sigmabar{\overline\sigma}
\def\epsilonbar{\overline\epsilon}
\def\half{{1\over 2}}
\def\FX{F}
\def\Branching{{\rm Br}}
\def\Splus{S_+}
\def\Sminus{S_-}
\def\mAMSB{F_\phi}
\def\Dcon{\overline D}
\def\centeron#1#2{{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1>\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0>\wd1
\kern.5\wd0\kern-.5\wd1\fi}}
\def\ltap{\;\centeron{\raise.35ex\hbox{$<$}}{\lower.65ex\hbox{$\sim$}}\;}
\def\gtap{\;\centeron{\raise.35ex\hbox{$>$}}{\lower.65ex\hbox{$\sim$}}\;}
\def\gsim{\mathrel{\gtap}}
\def\lsim{\mathrel{\ltap}}

Test

As a test, I created test.md, and run it with pandoc -s -o test.tex test.md.

In the results, I spotted at least 2 of these LaTeX command are being parsed and escaped:

Before

...
\newcommand{\dagg}[1]{#1^\dagger}
...
\def\centeron#1#2{{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1>\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0>\wd1
\kern.5\wd0\kern-.5\wd1\fi}}
...

After

...
\newcommand{\dagg}{[}1{]}\{\#1\^{}\dagger\}
...
\def\centeron\#1\#2\{\{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1\textgreater{}\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0\textgreater{}\wd1
\kern.5\wd0\kern-.5\wd1\fi\}\}
...

Another small problem I notice is that comparing to the markdown source to the generated TeX, the line breaking and spacing between things are changed. Why aren't them leave as is?

By the way, the def are not written by me.

mb21 commented 8 years ago

you should probably put such definitions in the template instead...

ickc commented 8 years ago

What you suggest is a workaround and I do have some other workaround. But I think it is supposed to work so it is a bug?

Edit: note that among those seas of def only the 2 I put in before & after has problems.

jgm commented 8 years ago

Pandoc will parse pretty standard LaTeX macro definitions with \newcommand. Once you start using TeX primitives like \def, all bets are off. So I'd suggest putting these in a template, or using another workaround.

I was surprised by what you reported with the \newcommand, though. I get a different result:

pandoc -t latex
\newcommand{\dagg}[1]{#1^\dagger}
^D
\newcommand{\dagg}[1]{#1^\dagger}

Most likely either you're using an older pandoc, or the parser got mixed up with some of the primitive TeX definitions that come before this \newcommand.

Anyway, I think this issue should just be closed.

+++ ickc [Apr 29 16 12:29 ]:

What you suggest is a workaround and I do have some other workaround. But I think it is supposed to work so it is a bug?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or [1]view it on GitHub

References

  1. https://github.com/jgm/pandoc/issues/2888#issuecomment-215855760
ickc commented 8 years ago

Yes. I did try to get a MWE with those commands alone but it didn't show the bug. Only if a particular combination of things are put together then the bug shows up. That's why I uploaded a MWE.

Before you close the issue, I want to empahsize that I am putting raw LaTeX code in the Markdown source and convert it to LaTeX, so that's why I said it is supposed to work in the other reply. If I were converting it to HTML then its my fault (but by the way, I do enclose the def and newcommand in $$ so that pandoc leave it as is and the MathJax can parse it successfully), but I am talking about raw LaTeX for LaTeX output, so I was hoping embedding raw LaTeX code can be more robust, since I'm going to do a lot.

edit: that's why I also mention the extra spaces and line breaks. Because I suppose they should be treated as raw LaTeX and pandoc shouldn't touch it in LaTeX output. This is related to another issue that currently there's no way (except for filter) to specify a certain section is raw latex/html but is left for pandoc to decide. I actually like this better (than say in MultiMarkdown) to keep the source clean, but that relies heavily on if pandoc can recognize them as raw LaTeX or not.

jgm commented 8 years ago

+++ ickc [Apr 29 16 13:28 ]:

Yes. I did try to get a MWE with those commands alone but it didn't show the bug. Only if a particular combination of things are put together then the bug shows up. That's why I uploaded a MWE.

Before you close the issue, I want to empahsize that I am putting raw LaTeX code in the Markdown source and convert it to markdown, so that's why I said it is supposed to work in the other reply. If I were converting it to HTML then its my fault (but by the way, I do enclose the def and newcommand in $$ so that pandoc leave it as is and the MathJax can parse it successfully), but I am talking about raw LaTeX for LaTeX output, so I was hoping embedding raw LaTeX code can be more robust, since I'm going to do a lot.

I understand that. But passing through raw LaTeX requires identifying the bits that are LateX and separating them from the bits that are Markdown. That's not always easy.

If you stick to \newcommand, pandoc will do this reliably. If you use tex primitives, especially things like

\def\centeron#1#2{{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1>\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0>\wd1
\kern.5\wd0\kern-.5\wd1\fi}}

things are more difficult for pandoc. It will recognize \def as a LaTeX command, and \centeron, but when it hits #1#2 it treats this as regular text. (And then everything that comes after is screwed up.)

We could probably do better in this particular case, but I think there's no hope of making it always work for arbitrary tex.

Note that you could easily rewrite the above using \newcommand.

ickc commented 8 years ago

@jgm

But passing through raw LaTeX requires identifying the bits that are LateX and separating them from the bits that are Markdown. That's not always easy.

Agree. I actually ran into this problem quite a while ago but didn't file an issue here because I know \def is not supported. But later on I was thinking about how dependable it is to put raw LaTeX code in pandoc, that's why I later filed the issue using the problem I encountered before.

I don't know how exactly this should be solved. One way I'm brain storming is to provide an optional way to explicitly declare a section to be raw LaTeX (or even HTML). @bpj has a filter do something like this in Pandoc filter to insert arbitrary raw output markup as Code/CodeBlocks with an attribute raw=.. Perhaps a solution like that should make into the official pandoc to deal with these kind of situation (to guarantee pandoc don't break the LaTeX code in markdown source when output to LaTeX).

As a sidenote, MathJax handles those macros fine (they support \let, \def, \(re)newcommand, (re)newenvironment). But a problem is MathJax requires an extra pair of math delimiters to enclose it. I'm still thinking how it should be done to write a markdown source that's compatible with both HTML and LaTeX output.

ickc commented 8 years ago

Bug report for pandoc parsing LaTeX macros

Although the nature of the issue is different but since the following bug is an example of the code I was talking about initially here, so I post it here rather than a new issue:

In LaTeX source:

\newcommand{\sbar}{\overline}
$-\lambda_f H \sbar f
f$

using the following pandoc command:

pandoc -s -o test.md test.tex

will resulted in:

$-\lambda_f H {\overline}f
f$

But the expected result is

$-\lambda_f H \overline f
f$

MWE attached here.

jgm commented 8 years ago

This is a known issue: see #1390

+++ ickc [May 09 16 19:01 ]:

will resulted in:

$-\lambda_f H {\overline}f f$

But the expected result is

$-\lambda_f H \overline f f$

jgm commented 8 years ago

It might be worth adding code to handle \def better, even if this doesn't solve the problem in full generality.

jgm commented 7 years ago

Closed by c806ef1b150147ecaf5a4781e2ac1ce921559ca4 which adds support for simple \def macros.