jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.3k stars 3.31k forks source link

html tables in Markdown are ignored in LaTeX #2236

Closed ghost closed 9 years ago

ghost commented 9 years ago

If you write something like:

Bla
<table>
<theader>
<tr>
<th>Component</th>
<th>Preu</th>
</tr>
</theader>
<tbody>
<tr>
<td>[Compulab Fitlet-B](http://fit-pc.com/wiki/index.php/Fit-PC_Product_Line:_fitlet)</td>
<td>239,73 €</td>
</tr>
<tr>
<td>Disc dur mSATA Kingston SSD [SMS200S3/120 GB](http://www.kingston.com/datasheets/sms200s3_en.pdf) (111.8 GiB)</td>
<td>86,95 €</td>
</tr>
<tr>
<td>Memòria RAM Kingston 8GB DDR3L 1600 CL10 240 PIN UDIMM RAM</td>
<td>69,60 €</td>
</tr>
</tbody>
<tfoot>
<tr>
<td>TOTAL</td>
<td>396,28 €</td>
</tr>
</tfoot>
</table>

in markdown and you convert to latex via:

pandoc -f markdown+raw_html+markdown_in_html_blocks+implicit_header_references+strikeout+tex_math_dollars+raw_tex+yaml_metadata_block+multiline_tables bitacora.md --biblio referencies.bib --csl amai.csl -N -s -S --toc -o bitacora.pdf (replace bitacora.* for your file name), then you get latex document without tables in it.

Can you see it, Thanks,

Xan

jgm commented 9 years ago

This is intended behavior. Raw HTML is raw HTML; it only appears in HTML-based output. If you want something that works across platforms, use a native pandoc Markdown table.

+++ Xavier [Jun 15 15 12:19 ]:

If you write something like: Bla

Component Preu
[Compulab Fitlet-B](http://fit-pc.com/wiki/index.php/Fit-PC_Product_Line:_fi tlet) 239,73 €
Disc dur mSATA Kingston SSD [SMS200S3/120 GB](http://www.kingston.com/datash eets/sms200s3_en.pdf) (111.8 GiB) 86,95 €
Memòria RAM Kingston 8GB DDR3L 1600 CL10 240 PIN UDIMM RAM 69,60 €
TOTAL 396,28 €

in markdown and you convert to latex via:

pandoc -f markdown+raw_html+markdown_in_html_blocks+implicit_header_references+st rikeout+tex_math_dollars+raw_tex+yaml_metadata_block+multiline_tables bitacora.md --biblio referencies.bib --csl amai.csl -N -s -S --toc -o bitacora.pdf (replace bitacora.* for your file name), then you get latex document without tables in it.

Can you see it, Thanks,

Xan

— Reply to this email directly or [1]view it on GitHub.

References

  1. https://github.com/jgm/pandoc/issues/2236
KurtPfeifle commented 9 years ago

Comment deleted (and re-pasted in again further below)

Reason: originally sent by Mail, its formatting looked horrible. After correcting it, it looked fine in the Preview tab, but still horrible after saving it...

New comment is now appearing as comment #112534318.

ghost commented 9 years ago

But markdown supports raw html embbeded. Doesn't?

This is intended behavior. Raw HTML is raw HTML; it only appears in HTML-based output. If you want something that works across platforms, use a native pandoc Markdown table.

ghost commented 9 years ago

@KurtPfeifle Thanks a lot for your trick

jgm commented 9 years ago

+++ Xavier [Jun 16 15 00:20 ]:

But markdown supports raw html embbeded.

Yes. Raw HTML is passed unchanged to HTML output.

ghost commented 9 years ago

@KurtPfeifle : your trick does not work.

KurtPfeifle commented 9 years ago

Comment deleted (and re-pasted in again further below)

Reason: originally sent by Mail, its formatting looked horrible. After correcting it, it looked fine in the Preview tab, but still horrible after saving it...

New comment is now appearing as comment #112541448 below...

KurtPfeifle commented 9 years ago

I edited my previous comments. I had sent them by mail, but this doesn't seem to go well when formatting it with Markdown, or when including images....

However, while my edits now look good in the Preview tab, they still look like cr*p after clicking "Update comment".

KurtPfeifle commented 9 years ago

You could try the trick to not convert to LaTeX directly, but to go Markdown (with raw-html) => HTML => LaTeX.

It may even work without temporarily saving the intermediate HTML:

pandoc \
  -f markdown+raw_html+markdown_in_html_blocks+implicit_header_references+strikeout+tex_math_dollars+raw_tex+yaml_metadata_block
\
   bitacora.md \
  -t html \
  -o - \
| pandoc \
   -f html \
   --biblio referencies.bib \
   --csl amai.csl \
   -N -s -S --toc \
   -o bitacora.pdf

(a few of your +markdownextension are unnessecary because Pandoc supports them by default, but I do not know by heart which ones these are). ​

KurtPfeifle commented 9 years ago

@Xavier:

My trick works… basically!

To debug this for you, I first divided my piped command chain into two separate commands. The first creates the HTML as bitacora.html. This opens in a browser and renders a correctly looking HTML table.

But what does not work is: to convert that HTML snippet with that HTML-table code to Markdown with, say, a grid_table:

$ cat bitacora.html
<p>Bla</p>

<table>
<theader>
    <tr><th>Component</th><th>Preu</th></tr>
</theader>
<tbody>
   <tr><td><a href="http://fit-pc.com/wiki/index.php/Fit-PC_Product_Line:_fitlet">Compulab
Fitlet-B</a></td><td>239,73 €</td></tr>
   <tr><td>Disc dur mSATA Kingston SSD <a
href="http://www.kingston.com/datasheets/sms200s3_en.pdf">SMS200S3/120
GB</a> (111.8 GiB)</td><td>86,95 €</td</tr>
   <tr><td>Memòria RAM Kingston 8GB DDR3L 1600 CL10 240 PIN UDIMM
RAM</td><td>69,60 €</td></tr>
</tbody>
<tfoot>
   <tr><td>TOTAL</td><td>396,28 €</td></tr>
</tfoot>
</table>

Then: pandoc -t markdown+grid_tables -f html bitacora.html:

Bla

Component

Preu

[Compulab
Fitlet-B](http://fit-pc.com/wiki/index.php/Fit-PC_Product_Line:_fitlet)

239,73 €

Disc dur mSATA Kingston SSD [SMS200S3/120
GB](http://www.kingston.com/datasheets/sms200s3_en.pdf) (111.8 GiB)

86,95 €

Memòria RAM Kingston 8GB DDR3L 1600 CL10 240 PIN UDIMM RAM

69,60 €

TOTAL

396,28 €

In other words: it’s not my “trick” that didn’t work, it’s that Pandoc isn’t able to convert this type of HTML table to a Markdown table.

About this, I’m not sure if it is a bug or simply a known (current) limitation. _I’m also not sure, if you HTML syntax is correct!_ Your <theader> may need to be named as <thead> (I’m not fluent in HTML, and I didn’t look it up). Modifying that in bitacora.html produces this:

$ pandoc -f html bitacora.html -t markdown

Bla

  Component                                                                                                       Preu
  --------------------------------------------------------------------------------------------------------------- ----------
  [Compulab Fitlet-B](http://fit-pc.com/wiki/index.php/Fit-PC_Product_Line:_fitlet)                               239,73 €
  Disc dur mSATA Kingston SSD [SMS200S3/120 GB](http://www.kingston.com/datasheets/sms200s3_en.pdf) (111.8 GiB)   86,95 €
  Memòria RAM Kingston 8GB DDR3L 1600 CL10 240 PIN UDIMM RAM                                                      69,60 €
  TOTAL                                                                                                           396,28 €

Which to me looks correct… Making the same changes to the initial bitacora.md even makes my proposed trick work…

pandoc -f markdown+raw_html+markdown_in_html_blocks bitacora.md -t html  \
| pandoc -f html --biblio referencies.bib --csl amai.csl -N -s -S --toc  \
         -V geometry:"margin=0.5cm, paperwidth=500pt, paperheight=170pt" \
         -o bitacora.pdf

…and the bitacora.pdf is created successfully, looking how it ought to look (screenshot):

bitacora

_Lessons learned:_

  1. Never give up debugging too early.
  2. Never put blame on one component of your toolchain prematurely.
  3. If an endresult is not correct, look at _all_ steps of your process.
  4. In most cases it is the input (or the user) which is to blame.         :-)
ghost commented 9 years ago

Ok. Thanks @KurtPfeifle