jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.28k stars 3.36k forks source link

HTML tables in markdown are not rendered into docx #7736

Closed deenine closed 2 years ago

deenine commented 2 years ago

Explain the problem. HTML tables embedded in markdown are not rendered into docx output. For example, markdown with two tables, one native MD pipe and one embedded html:

# Table test

# MD

| fruit | quantity |
|---|---|
| apples | 3 |
| pears | 2 |

# HTML

<table>
  <tr>
    <th>Fruit</th>
    <th>Quantity</th>
  </tr>
  <tr>
    <td>Oranges</td>
    <td>5</td>
  </tr>
  <tr>
    <td>Grapes</td>
    <td>35</td>
  </tr>
</table>

This renders correctly into html, however when output into docx using pandoc tab_test.md -o tab_test.docx, the HTML table is not parsed and results in:

Screenshot 2021-12-07 at 11 44 59

It appears that the markdown parser is not triggering html parsing of the table, as the output element works correctly if I convert md to html to docx:

pandoc tab_test.md -o tabtest.html
pandoc tabtest.html -o tab_test.docx

Pandoc version? Pandoc 2.16.2 installed with brew on OSX 11.6 Big Sur on 2019 MBP Intel i7.

$pandoc --version
pandoc 2.16.2
Compiled with pandoc-types 1.22.1, texmath 0.12.3.2, skylighting 0.12.1,
citeproc 0.6, ipynb 0.1.0.2
User data directory: /Users/username/.local/share/pandoc
Copyright (C) 2006-2021 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
mb21 commented 2 years ago

This is expected behaviour. Raw HTML in markdown input is passed on as is to HTML output and omitted on export to other formats.

deenine commented 2 years ago

@mb21 Is it desirable to omit on export to other formats?

It is very common to embed HTML tables in markdown, and Pandoc is clearly able to parse HTML tables as evidenced by the html-docx conversion demonstrated above.

tarleb commented 2 years ago

For a work-around, see https://github.com/jgm/pandoc/issues/6317#issuecomment-977150938.

reinaortega commented 1 month ago

Has that been fixed/added? I still have the same problem. I am using this pandoc version: pandoc 3.1.1 Features: +server +lua Scripting engine: Lua 5.4 User data directory: /root/.local/share/pandoc Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose.

This will enormously facilitate to use complex tables in markdown documents that will be converted to docx.

jgm commented 1 month ago

See comment above: https://github.com/jgm/pandoc/issues/7736#issuecomment-987875443