quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.75k stars 306 forks source link

table cross references result in malformed xml #10587

Open rmflight opened 3 weeks ago

rmflight commented 3 weeks ago

Bug description

I've created a repo that documents the issues encountered in trying to create cross-referenced tables in a qmd document output to Word.

I realize that solving the cross-referenced XML issue is difficult if one doesn't know what bad XML is causing the issue, and what the good XML looks like, so I worked to figure it out.

These have been confirmed using:

R 4.3.0 (running on Pop!OS)
quarto 1.6.5
Word 2019
Windows 10 Education

Problem

When creating a Word document with a cross-referenced table included, the table XML becomes malformed, and Word complains bitterly, and the tables don't look right.

Resources

The repo has an R file with 3 functions:

qmd files to provide reproducible examples of things not working, as well as the corresponding docx files:

Issue

Examining the XML from the files above, we can discover some weird XML created when cross-references are auto generated (whether gt or kable are used to generate the table):

  1. The creation of a nested table, where the first starts before the caption, and then the next table is within the first one.
  2. The bookmark that defines where to link to the caption, is actually wrapped around the entire second table (start loc, end loc), instead of just the caption text.

Solution

  1. Make sure only a single table is defined, after the table caption (see modified example).
  2. Create the bookmark only around the table caption (see modified example).

I've hand crafted the XML in the _modified_to_work directories, and then created corresponding docx files, and verified that the links do work between text and caption, Word no longer complains, and the tables look better.

Related to #7151 , #9650

Steps to reproduce

No response

Expected behavior

No response

Actual behavior

No response

Your environment

Quarto check output

Quarto 1.6.5
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.2.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.6.5
      Path: /opt/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: (external install)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/rmflight/.TinyTeX/bin/x86_64-linux
      Version: 2022

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.12
      Path: /usr/bin/python3
      Jupyter: (None)

      Jupyter is not available in this Python installation.
      Install with python3 -m pip install jupyter

(/) Checking R installation...........→ R version 4.3.0 (2023-04-21)
→ Running under Pop!_OS 22.04 LTS
(|) Checking R installation...........→ System time is 2024-08-22 09:25:08.269918
[✓] Checking R installation...........OK
      Version: 4.3.0
      Path: /rmflight_stuff/software/R-4.3.0
      LibPaths:
        - /rmflight_stuff/software/R-4.3.0/library
      knitr: 1.42
      rmarkdown: 2.21

[✓] Checking Knitr engine render......OK
rmflight commented 3 weeks ago

Just to be doubly sure what is causing the error message and malformed tables Word (and it is in Word only, LibreOffice doesn't care, which is a pain to check this because I need to be running a Windows VM), I also hand crafted examples:

The nested tables (which according to other issues seems to intended behavior with automatic captions) seem to be the actual problem. Word complains when the nested tables are present.

See the difference in opening removing bookmark (but keep nested tables) (source) and removing nested tables (source).

cscheid commented 3 weeks ago

Does this happen anywhere else besides gt tables?

rmflight commented 3 weeks ago

Yes, kable tables seem to suffer the same thing. I think any table with a caption ends up the same way, given it seems to happen for both gt and kable with captions both nested tables, and although Word doesn't complain about the kable case, the table still looks wrong, and gets fixed by removing the nested situation.

rmflight commented 3 weeks ago

I could easily add flextable and any other table packages we want to evaluate to triple check that it is the nested tables due to captions.

Are there other table frameworks that should be included?

cscheid commented 3 weeks ago

No, thank you. The reason I asked about gt was to rule out Quarto's HTML-to-Pandoc processing code, of which only gt takes advantage, as far as I know.

rmflight commented 3 weeks ago

Yeah, I wanted to check if it was a gt issue only, so I made sure to check it all with another table generator, and then also see what happened when I hand tweaked the XML for it as well.

rmflight commented 3 weeks ago

Just for funsies, I did check with a flextable as well.

Word still complains, but in contrast to gt and kable, the table still looks OK in the recovered document.

image

image

And then again, removing the nested table structure by hand massaging the XML and creating a new document with it, Word stops complaining about the document.

katossky commented 2 weeks ago

Just for information if it may help debugging it also works and without complaints from Word with tinytable.

rmflight commented 2 weeks ago

@katossky In my testing, you are right. However, none of the styling applied seems to show up in the Word document either. See this commit: https://github.com/rmflight/gt_quarto_table_weirdness/commit/da43fd471babc7b76fdea72ea11731286c8d8142