jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.8k stars 3.33k forks source link

Mediawiki generates invalid LaTeX #9296

Open pepijndevos opened 8 months ago

pepijndevos commented 8 months ago

Explain the problem.

The wikipedia page for Maxwell's equations contains a table with equations, which when converted to LaTeX throws a bunch of errors:

! Package amsmath Error: \begin{align} allowed only in paragraph mode.
FEATURED_TITLE="Maxwell's_equations"
pandoc --from mediawiki --to latex -s -V papersize=a5 -V geometry:margin=1cm -V mainfont="Liberation Serif" "$FEATURED_TITLE.txt" -o "$FEATURED_TITLE.tex"
lualatex --interaction=nonstopmode "$FEATURED_TITLE.tex"

Maxwell's_equations.txt

There are other issues with the output but this is the most breaking one.

Pandoc version?

Arch Linux:

$ pandoc --version
pandoc 3.1.6
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /home/pepijn/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
jgm commented 8 months ago

You cannot put an align environment inside $$..$$ for display math.

Unfortunately, MathJax has muddied the issue by allowing this and treating it specially. My guess is that MediaWiki just passes the contents of a code element into MathJax.

For a related issue see #6703.

The best I could suggest would be using a Lua filter to change {align} to {aligned} in Math inlines. This would lose the numbering, because aligned environment does not number -- it is intended to be used inside display math.

In principle pandoc could just do this automatically, and that's the only modification of the MediaWiki reader that makes sense to me in response to this problem. But it would mean that results were slightly worse when converting mediawiki -> HTML (where MathJax could be used).

Another alternative would be to modify the LaTeX writer to make this transformation, or even better to excerpt align environments from Math inlines and replace them with raw LaTeX.