jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.8k stars 3.39k forks source link

docx to md creates faulty tables if images are in them #10402

Closed Zujiry closed 3 days ago

Zujiry commented 3 days ago

Explain the problem. I am attempting to convert a docx to markdown. Then the markdown to pdf. The idea is to add mermaid to the markdown in a later step, but that is not the problem yet.

script.sh

pandoc -t markdown sample.docx -o output.md --extract-media=.
pandoc output.md -o output.pdf --pdf-engine=pdflatex -H header.tex

header.tex

\usepackage{xcolor}
\usepackage{graphicx}
\DeclareUnicodeCharacter{2003}{\hspace{1em}}

The output.md has tables in it, that look like this:

+-------------------------+---------------+-------------------------+
| Symbol                  | Seite         | Inhalt                  |
+=========================+===============+=========================+
| ![](./me                | Analyse       | Top-Seite für           |
| dia/image13.png){width= |               | Analysesichten          |
| "0.47244094488188976in" |               |                         |
| height="                |               |                         |
| 0.42913385826771655in"} |               |                         |
+-------------------------+---------------+-------------------------+

Leading to the error:

[WARNING] Could not fetch resource './me%20dia/image13.png': replacing image with description

I am guessing that something goes wrong because the media link is cut.

Pandoc version? pandoc 2.9.2.1

kysko commented 3 days ago

You could force a reference link with --reference-links, which would shorten what's written in the cell.

You will get ![][1] in the cell, and

  [1]: ./media/image13.png {width="0.47244094488188976in"
  height="0.42913385826771655in"}

as reference link.

That would apply for the whole md document of course, but it's only an intermediate in your case.

Zujiry commented 3 days ago

That solves it, thank you -> In the .pdf this looks how it is supposed to.