jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.03k stars 3.35k forks source link

Table rendering is broken for `--from=latex --to=markdown`. #3087

Closed alexander-matsievsky closed 8 years ago

alexander-matsievsky commented 8 years ago

Problem

Good day! I'm trying to convert a .tex document to .md. I've encountered an issue with table rendering. Don't know whether it's a bug or I'm missing something. Grepping the web for an answer showed nothing, except https://github.com/jgm/pandoc/issues/2669. Your help will be greatly appreciated.

Case 1

LyX-generated .tex file.

\begin{tabular}{|c|c|c|}
\hline 
a & b & c\tabularnewline
\hline 
\hline 
1 & 2 & 3\tabularnewline
\hline 
\end{tabular}
> cat input.txt | pandoc --from=latex --to=markdown
<span>|c|c|c|</span> a & b & c<span>\
</span> 1 & 2 & 3<span>\
</span>

Case 2

LyX-generated .tex file with \tabularnewline command manually replaced with \\

\begin{tabular}{|c|c|c|}
\hline 
a & b & c\\
\hline 
\hline 
1 & 2 & 3\\
\hline 
\end{tabular}
> cat input-wo-tabularnewline.txt | pandoc --from=latex --to=markdown
   a   b   c
  --- --- ---
   1   2   3

System

> sw_vers
ProductName:    Mac OS X
ProductVersion: 10.11.2
BuildVersion:   15C50
> pandoc -v
pandoc 1.17.2
Compiled with texmath 0.8.6.4, highlighting-kate 0.6.2.1.
Syntax highlighting is supported for the following languages:
    abc, actionscript, ada, agda, apache, asn1, asp, awk, bash, bibtex, boo, c,
    changelog, clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css,
    curry, d, diff, djangotemplate, dockerfile, dot, doxygen, doxygenlua, dtd,
    eiffel, elixir, email, erlang, fasm, fortran, fsharp, gcc, glsl,
    gnuassembler, go, hamlet, haskell, haxe, html, idris, ini, isocpp, java,
    javadoc, javascript, json, jsp, julia, kotlin, latex, lex, lilypond,
    literatecurry, literatehaskell, llvm, lua, m4, makefile, mandoc, markdown,
    mathematica, matlab, maxima, mediawiki, metafont, mips, modelines, modula2,
    modula3, monobasic, nasm, noweb, objectivec, objectivecpp, ocaml, octave,
    opencl, pascal, perl, php, pike, postscript, prolog, pure, python, r,
    relaxng, relaxngcompact, rest, rhtml, roff, ruby, rust, scala, scheme, sci,
    sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, tcsh, texinfo, verilog, vhdl,
    xml, xorg, xslt, xul, yacc, yaml, zsh
Default user data directory: /Users/alexander-matsievsky/.pandoc
Copyright (C) 2006-2016 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
alexander-matsievsky commented 8 years ago
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
%% Because html converters don't know tabularnewline
\providecommand{\tabularnewline}{\\}
jgm commented 8 years ago

If you add this \providecomand line to your latex source, it should work. The problem is, as it says, that pandoc doesn't know about tabularnewline.

+++ alexander-matsievsky [Aug 24 16 10:58 ]:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands. %% Because html converters don't know tabularnewline \providecommand{\tabularnewline}{}

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/jgm/pandoc/issues/3087#issuecomment-242154453
  2. https://github.com/notifications/unsubscribe-auth/AAAL5LJGZ7Uq6NePq_Jw2hCvSblNHUMiks5qjIZYgaJpZM4JsRyq
alexander-matsievsky commented 8 years ago

This snippet \providecommand{\tabularnewline}{\\} is present in both input.txt and input-wo-tabularnewline.txt. The .tex sources above are excerpts.

Besides that, pandoc seems to recognize this command and substitute \tabularnewline with \\ when doing --parse-raw:

> cat input.txt | pandoc --from=latex --to=markdown --parse-raw
\begin{tabular}{|c|c|c|}
\hline 
a & b & c{\\}\hline 
\hline 
1 & 2 & 3{\\}\hline 
\end{tabular}
alexander-matsievsky commented 8 years ago

I had to construct a pipeline to make it work from the original source:

> cat input.txt | pandoc --from=latex --to=markdown
<span>|c|c|c|</span> a & b & c<span>\
</span> 1 & 2 & 3<span>\
</span>
> cat input.txt | sed -e s/\\tabularnewline/\\\\/g | pandoc --from=latex --to=markdown
   a   b   c
  --- --- ---
   1   2   3

I wonder if that's the only way.

jgm commented 8 years ago

Pandoc currently resolves macros only for math and for raw LaTeX bits. So if you use --parse-raw, then pandoc will, on realizing it can't parse the table, emit a raw LaTeX table with the macro resolved. This isn't much help for converting to markdown, though. Also, there's an issue with the current macro resolution: spurious {} are added, which breaks the table (see #1390).

It would be better to resolve macros on the raw latex before doing any parsing at all, with a two-pass system. For various complicated reasons it wasn't done this way in the first place.

alexander-matsievsky commented 8 years ago

@jgm Thank you for a detailed explanation!