jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34k stars 3.35k forks source link

Issues in rendering tables to docx #8917

Closed JohnMcHugh closed 4 months ago

JohnMcHugh commented 1 year ago

Explain the problem. I am using placeholder Figures and Tables in a LaTex document that is being converted to docx format. The figures work, the tables do not seem to. The plan is to later insert image files into the figures and tables in the docx file. Each figure / table contains a line of text identifying the file to be used and giving the caption for the figure / table as well as defining its label. I would expect to see the text line and the caption in the resulting document at the point where the figure of label declaration appeared and be able to reference the label elsewhere. This is what I get in the docx file for the figure, but not for the table. In the case of the table, it appears that the caption is ignored and the label is undefined. In the table, the caption appears first, but I get the same result if the caption appears below the text line.

Manual inspection of the docx xml confirms the problem. More details in the attached files:

description.txt contains a more detailed description and console outputs from test and version runs pandoc_bug.txt is a minimal LaTeX test case with 1 figure (works) and 1 table (does not) It was renamed from pandoc_bug.tex as your system would not upload a .tex file pandoc_bug.pdf is the pdflatex output pandoc_bug.docx is the pandoc output from my system trypandoc.docx is the pandoc output from the trypandoc site.

Pandoc version? pandoc 3.1.3 - Detailed version information in the description.txt file. Version at https://pandoc.org/try/ also used Environment is Mac Mini 2023 Apple M2 Pro 16GB ram running Ventura 13.4

description.txt pandoc_bug.txt pandoc_bug.pdf pandoc_bug.docx trypandoc.docx

jgm commented 1 year ago

We don't represent caption position in our internal document representation, so that will be lost in conversion. (It's more of a presentation detail than a matter of content.)

The other issues are:

Since both these things work now for figures, it should not be hard to get them working for tables too.

jgm commented 1 year ago

If you do pandoc -t native -f latex pandoc_bug.txt, you'll see that pandoc does not parse this as a table. The reason is that there is no tabular inside the table (no tabular data).

That is why the caption doesn't come through and the reference is not resolved.

So, I don't think this is really a serious issue. If you add a tabular environment, the caption will come through and the references will be resolved.

tarleb commented 4 months ago

Closing. Please reopen if something needs to be fixed in pandoc.