quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.77k stars 309 forks source link

Citations in tables do not render correctly outside of raw markdown and kable #3340

Closed andrewheiss closed 1 year ago

andrewheiss commented 1 year ago

Bug description

There is inconsistent behavior when including citation keys inside tables, but it's a really tricky issue and I'm not sure if it's a Quarto issue or an issue with other table-making packages.

When making a table with regular Markdown or with knitr::kable(), citation keys that are included in table rows are rendered correctly by pandoc.

However, when making a table with kableExtra::kbl() or gt::gt(), pandoc doesn't see the citation keys.

Here's a reproducible example:

---
title: "Citations in tables"
format: html

references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Molecular structure of nucleic acids
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

## Raw Markdown

This works.

| Thing | Citation      |
|-------|---------------|
| 1234  | @Lovelace1842 |

## `knitr::kable()`

This works.

```{r}
library(knitr)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  kable()

kableExtra::kbl()

This doesn't work.

library(kableExtra)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  kbl() |>
  kable_styling()

gt::gt()

This doesn't work.

library(gt)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  gt()

Raw Markdown and `knitr::kable()` work correctly:

<img width="883" alt="image" src="https://user-images.githubusercontent.com/73663/202052027-f7683db1-cd2b-47ca-86c4-6b1910aed957.png">

`kableExtra::kbl()` and `gt::gt()` do not render the citation:

<img width="753" alt="image" src="https://user-images.githubusercontent.com/73663/202052106-610933bc-b59e-4bc7-af44-dc8bb7bed568.png">

---

This is a known issue in both **kableExtra** (https://github.com/haozhu233/kableExtra/issues/214#issuecomment-421706528) and **gt** (https://github.com/rstudio/gt/issues/112#issuecomment-632936052), and the existing solution for both packages is to use **bookdown**'s text reference feature to render the citation outside of the table and then automatically include the rendered citation inside the table when knitting, like this:

(ref:lovelace) @Lovelace1842

library(kableExtra)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "(ref:lovelace)"
) |>
  kbl() |>
  kable_styling()


This doesn't work in Quarto, though. #1959 and #1785 are about adding bookdown-esque text references to Quarto, which would potentially solve this issue.

---

This is with RStudio 2022.07.2+576 on macOS Monterey 12.6, but it happens across platforms

### Checklist

- [X] Please include a minimal, fully reproducible example in a single .qmd file? Please provide the whole file rather than the snippet you believe is causing the issue.
- [X] Please [format your issue](https://quarto.org/bug-reports.html#formatting-make-githubs-markdown-work-for-us) so it is easier for us to read the bug report.
- [X] Please document the RStudio IDE version you're running (if applicable), by providing the value displayed in the "About RStudio" main menu dialog?
- [X] Please document the operating system you're running. If on Linux, please provide the specific distribution.
andrewheiss commented 1 year ago

Ooh, @debruine's issue/solution in https://github.com/quarto-dev/quarto-cli/issues/1710 also works here! This can be fixed by removing the class with unclass(), printing the result with cat(), and including results="asis" in the chunk options

## `kableExtra::kbl()` with `unclass()`

This works!

```{r results="asis"}
library(kableExtra)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  kbl() |>
  kable_styling() |>
  unclass() |> cat()


<img width="851" alt="image" src="https://user-images.githubusercontent.com/73663/202063540-01c2d92c-65ad-40d3-b3cb-d73f38b1dbf4.png">
MohdAzmiSuliman commented 1 year ago

i was looking for this too. however if i use result = "asis", the rendered html files unable to recognise it as table for cross reference.

---
title: "kable extra citation"
format: html
---

refer @tbl-table

```{r}
#| label: tbl-table
#| tab-cap: "table with citation"
#| results: asis

library(kableExtra)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  kbl() |>
  kable_styling() |>
  unclass() |> cat()


![image](https://user-images.githubusercontent.com/1527738/202330165-54678f36-b4b0-45b0-a487-432733aac8a0.png)

is there any way to caption the table and crossref it?
cscheid commented 1 year ago

I think you need tbl-cap instead of tab-cap?

andrewheiss commented 1 year ago

Using tbl-cap adds the caption to the table itself, but it is still not cross-referencable:

See @tbl-table

```{r}
#| echo: fenced
#| label: tbl-table
#| tbl-cap: Table with citation
#| results: asis

library(kableExtra)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  kbl() |>
  kable_styling() |>
  unclass() |> cat()


<img width="879" alt="image" src="https://user-images.githubusercontent.com/73663/202537630-1f3b03c3-29c0-42e6-b284-b87ff1217656.png">
cscheid commented 1 year ago

This is now fixed, although kableExtra will need to use a slightly different method (it's currently emitting plain HTML without enclosing it in a RawBlock, which is not HTML syntax that we officially support).

For a preview of how this is going to work in the future, consider this snippet using an in-development branch of gt:

```{r}
library(gt)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  gt() |> fmt_markdown()


If you tell gt that you're providing it with markdown input (through `fmt_markdown`, which can also be scoped to cells, columns, rows, etc), then you get this:

<img width="668" alt="image" src="https://user-images.githubusercontent.com/285675/217653057-75c4de69-a7fe-4ef2-afed-15421e59c59b.png">

If you don't call `fmt_markdown()`, you get this:

<img width="651" alt="image" src="https://user-images.githubusercontent.com/285675/217653148-19b16ac4-fed8-422c-8c62-273686fe1e96.png">
cderv commented 1 year ago

This is now fixed, although kableExtra will need to use a slightly different method (it's currently emitting plain HTML without enclosing it in a RawBlock, which is not HTML syntax that we officially support).

@cscheid about this, Aren't we enclosing ourself the result in Raw Block ? I believe we do it here

What more would be needed ? I think we need to make sure what is planned for gt will work for other framework.

Also, some comment on why in R Markdown ecosystem the raw block is not used. This is because that way it can benefit with Pandoc from the markdown_in_html_blocks extension. With Quarto we loose this feature.

This means that with rmarkdown,

Rmd file ````markdown --- title: "Citations in tables" output: bookdown::html_document2: keep_md: true references: - type: article-journal id: Lovelace1842 author: - family: Lovelace given: Augusta Ada issued: date-parts: - - 1842 title: >- Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, officer of the military engineers, with notes upon the memoir by the translator title-short: Molecular structure of nucleic acids container-title: Taylor’s Scientific Memoirs volume: 3 page: 666-731 language: en-GB --- ## Raw Markdown This works. | Thing | Citation | |-------|---------------| | 1234 | @Lovelace1842 | ## `knitr::kable()` This works. ```{r} library(knitr) tibble::tribble( ~Thing, ~Citation, 1234, "@Lovelace1842" ) |> kable() ``` ## `kableExtra::kbl()` This doesn't work. ```{r} library(kableExtra) tibble::tribble( ~Thing, ~Citation, 1234, "@Lovelace1842" ) |> kbl() |> kable_styling() ``` ## `gt::gt()` This doesn't work. ```{r} library(gt) tibble::tribble( ~Thing, ~Citation, 1234, "@Lovelace1842" ) |> gt() ``` ````

Putting that here in case R Markdown user are wondering why the difference.

andrewheiss commented 1 year ago

@cscheid Do you know which branch of gt has support for the preview version here (https://github.com/quarto-dev/quarto-cli/issues/3340#issuecomment-1423252268)? Using the gt() > fmt_markdown() approach with today's GitHub version still leads to non-working output:

image

cscheid commented 1 year ago

@rich-iannone ?

jameshowison commented 1 year ago

Anyone know if this should be working now? I don't think it is for me.

---
title: "trial"
format: html
---

## Quarto {#sec-testing}

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.

Inside a table cross-refs don't render:

```{r}
library(gt)
tibble::tribble(
  ~Thing, ~Link,
  1234, "@sec-testing"
) |>
  gt() |> fmt_markdown()

But outside a table it works fine @sec-testing



Produces this output:

![Screen Shot 2023-08-29 at 1 43 11 PM](https://github.com/quarto-dev/quarto-cli/assets/91986/c24b05bd-d8b9-442b-a309-84536ba0a5c4)

I'm expecting a Linked "Section 1" to appear inside the table?  This also doesn't work with `kable` for me either.

Current versions:
gt (0.9.0)
quarto (cmdline) 1.3.242
cscheid commented 1 year ago

Actually, now that I say that, I wonder. @rich-iannone, shouldn't that fmt_markdown be wrapping the markdown content in a <span data-qmd="..."></span> element? It doesn't seem to be right now. This is the HTML being emitted:

<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>

    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="Thing">Thing</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="Link">Link</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td headers="Thing" class="gt_row gt_right"><div class='gt_from_md'><p>1234</p>
</div></td>
<td headers="Link" class="gt_row gt_left"><div class='gt_from_md'><p>@sec-testing</p>
</div></td></tr>
  </tbody>

</table>
cscheid commented 1 year ago

We do have a second issue, though, which is that gt emits data-quarto-disable-processing="false", and quarto checks for

              if table.attributes[constants.kDisableProcessing] ~= nil then

This one is a quarto bug and I'll fix that one. In my testing, if <div class='gt_from_md'><p>...</p></div> is replaced with <span data-qmd="..."></span>, then quarto (after this bugfix) actually renders this crossref correctly.

@jameshowison : you'll need to install the latest 1.4 prerelease (and wait for the confirmation and fix of the gt issue) but this should all be working very soon. Thanks for the patience!

rich-iannone commented 1 year ago

Okay, there was a change required in gt to ensure that Quarto is detected during the render. I've made that change just now as https://github.com/rstudio/gt/commit/e1feb6ba5a474c58d22b2a95a3c12cfed1f3117b and the interface mentioned above by Carlos now works. You'll need to get the latest dev version of gt with devtools::install_github("rstudio/gt").

I tested with this .qmd:

---
format: html
references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Molecular structure of nucleic acids
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

```{r}
library(gt)

tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  gt() |> fmt_markdown()


and got this document, where hovering on the formatted citation yields the full reference.

<img width="1020" alt="quarto-gt-citation-in-table" src="https://github.com/quarto-dev/quarto-cli/assets/5612024/f8a6b8d2-f992-4f1b-a45a-e2fc8382a253">

Really sorry it took so long to discover the underlying `gt` issue. Try it out and let me know if it doesn't work on your system.
cscheid commented 1 year ago

I can confirm this works on main(and the version of gt on github) as well. kableExtra will need to emit its output slightly differently, so I'm going to go ahead and close this one.

cderv commented 1 year ago

kableExtra will need to emit its output slightly differently,

I believe this could be in knitr directly that this should be done. So we could try sync an update with 1.4 release for this. I opened an issue there.

andrewheiss commented 10 months ago

This issue persists when rendering to PDF (using gt 0.10.0 and Quarto 1.4.424):

---
title: "Citations in tables"
format: pdf

references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Sketch of the analytical engine
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

```{r}
library(gt)

tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  gt() |> fmt_markdown()


<img width="929" alt="image" src="https://github.com/quarto-dev/quarto-cli/assets/73663/1430fc23-47f5-4fd8-bd89-dbbd17f4d56c">
andrewheiss commented 10 months ago

knitr::kable() works with PDF:

```{r}
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  knitr::kable()

<img width="795" alt="image" src="https://github.com/quarto-dev/quarto-cli/assets/73663/548a4632-eef8-47b9-a543-3db93ee7fa04">

&nbsp;

`kableExtra::kbl()` doesn't work with PDF:
library(kableExtra)

tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  kbl() |> kable_styling()


<img width="255" alt="image" src="https://github.com/quarto-dev/quarto-cli/assets/73663/cd12ed25-a3c9-4fba-b9e8-4c3c92fd2b13">

&nbsp;

That *might* be related to https://github.com/yihui/knitr/issues/2289, though that issue is specifically about HTML output
cscheid commented 10 months ago
```{r}
library(gt)

tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  gt() |> fmt_markdown()

Somehow, gt is not formatting this with markdown, but is emitting LaTeX, cf `keep-md: true`

::: {.cell}

library(gt)

tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  gt() |> fmt_markdown()

::: {.cell-output-display} \begin{longtable}{rl} \toprule Thing & Citation \ \midrule\addlinespace[2.5pt] 1234 & @Lovelace1842 \ \bottomrule \end{longtable}

::: :::



@rich-iannone, do you know what's going on here?
cderv commented 10 months ago

The document is using format: pdf here, this means that PDF output is expected. gt will then output as_latex() because this is used in a format: pdf rendering.

I don't think we have yet an equivalent of data-qmd trick on Span that works with LaTeX tables. (probably because we don't parse tables).

If you want HTML tables, even in format: pdf document, because you known Quarto will parse the HTML table anyway, you can ask for it

```{r}
library(gt)

tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842"
) |>
  gt() |> fmt_markdown() |> as_raw_html()


This will correctly process the citation in the table, even for PDF outputs. 

So at the end, this is a **gt** choice: when inside quarto document rendering, if output format is LaTeX, should it generate a HTML table anyway ? At least when `fmt_markdown()` is used maybe ? 

Otherwise, it will be to the user to explicitly ask for HTML version of the table. 
cscheid commented 10 months ago

I don't think we have yet an equivalent of data-qmd trick on Span that works with LaTeX tables. (probably because we don't parse tables).

But if you provide an HTML quarto-style table in LaTeX, we will do the right thing. That was the whole point of doing a uniform parser for tables in quarto. See:

```` --- format: pdf --- ```{r} #| label: fig-1 #| fig-cap: A caption plot(1:100) ``` ```{r} #| label: fig-2 #| fig-cap: Another caption plot(20:30) ``` ```{=html}
WhatWhere
``` ````
image
image
cscheid commented 10 months ago

And, more specifically, I thought that gt had been changed to have a way to send HTML tables to quarto, even in non-HTML formats.

andrewheiss commented 10 months ago

Ok cool, yeah, I also thought that {gt} had changed for non-HTML formats and emitted HTML. Adding as_raw_html() works for PDF as expected, but leaving it on messes with regular HTML output, so some conditional work has to be done to not use it with HTML

Normal HTML output:

image

 

With as_raw_html() so that LaTeX is happy, but now the HTML output is messed up because something changed with the gt CSS or something:

image

But that's definitely a {gt} issue, not a Quarto issue

cderv commented 10 months ago

But if you provide an HTML quarto-style table in LaTeX, we will do the right thing. That was the whole point of doing a uniform parser for tables in quarto.

yeah I know that. We all agree here ! The code I shared was about specifically outputting HTML in the PDF format for it to work.

And, more specifically, I thought that gt had been changed to have a way to send HTML tables to quarto, even in non-HTML formats.

I don't think it has. @rich-iannone is this a plan to output HTML table for LaTeX document when rendered in Quarto ?

Only drawback of this approach is possibly loosing all the custom LaTeX styling for gt table if any, that can't be set through Quarto/Pandoc writing the raw LaTeX instead of gt directly.

The example share by @andrewheiss is a hint about this I believe. Possibly some quarto improvement on this not gt as this is Quarto/Pandoc that is generating the raw LaTeX inserted, and no more gt.

@rich-iannone I let you open a ticket on your repo to track potential improvement and more testing.

andrewheiss commented 10 months ago

Yeah, using as_raw_html() strips away all custom LaTeX stuff like column widths when outputting to HTML

With HTML:

image

With LaTeX:

image
rich-iannone commented 10 months ago

@andrewheiss me and Carlos discussed this and we have a good plan in place that will preserve the LaTeX table code that gt generates while also allowing Quarto to properly handle citations as well as performing the Markdown conversion. More to come.

cscheid commented 10 months ago

@rich-iannone the PR on the quarto side is here https://github.com/quarto-dev/quarto-cli/pull/7451

andrewheiss commented 5 months ago

Following up on this (and related to https://github.com/quarto-dev/quarto-cli/issues/9342), @rich-iannone is there a way to use \QuartoMarkdownBase64{} from https://github.com/quarto-dev/quarto-cli/pull/7451 in {gt}?

For example, this qmd has an equation and a citation:

---
title: "Citations in tables"
format: 
  html: default
  pdf: 
    keep-tex: true

references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Molecular structure of nucleic acids
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

$$
a^2 + b^2 = c^2
$${#eq-math}

```{r}
library(gt)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842",
  5678, "@eq-math"
) |>
  gt() |> 
  fmt_markdown(Citation)

When rendering to PDF, those `@` things don't get processed, but doing it manually with LaTex and `\QuartoMarkdownBase64{}` works (kind of; in https://github.com/quarto-dev/quarto-cli/issues/9342 this doesn't work with citation keys yet)

````qmd
```{=latex}
% QExvdmVsYWNlMTg0Mg==} is "@Lovelace1842" in base-64 encoding
% QGVxLW1hdGg= is "@eq-math" in base-64 encoding
\begin{tabular}{cc}
Thing & Citation \\
1234 & \QuartoMarkdownBase64{QExvdmVsYWNlMTg0Mg==} \\
5678 & \QuartoMarkdownBase64{QGVxLW1hdGg=} \\
\end{tabular}

I tried adding the base64 content myself in `gt()`:

tibble::tribble( ~Thing, ~Citation, 1234, "\QuartoMarkdownBase64{QGVxLW1hdGg=}", 5678, "\QuartoMarkdownBase64{QExvdmVsYWNlMTg0Mg==}" ) |> gt()


…but {gt} [automatically converts](https://github.com/rstudio/gt/blob/411328ce56ec6dfac355f1df2b13749c17b27354/R/helpers.R#L3701) `\\` to `\textbackslash{}`, so we end up with this LaTeX:

```latex
\begin{longtable*}{rl}
\toprule
Thing & Citation \\ 
\midrule\addlinespace[2.5pt]
1234 & \textbackslash{}QuartoMarkdownBase64\{QExvdmVsYWNlMTg0Mg==\} \\ 
5678 & \textbackslash{}QuartoMarkdownBase64\{QGVxLW1hdGg=\} \\ 
\bottomrule
\end{longtable*}

and this PDF:

image

Is there a way to get {gt} to emit \QuartoMarkdownBase64{} without automatically escaping the LaTeX?

cderv commented 5 months ago

@andrewheiss I believe this is somewhat related to this issue I opened in GT for the case for HTML

But same workaround applies

library(gt)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "\\QuartoMarkdownBase64{QGVxLW1hdGg=}",
  5678, "\\QuartoMarkdownBase64{QExvdmVsYWNlMTg0Mg==}"
) |>
  gt() |> 
  fmt_passthrough(
    columns = Citation,
    escape = FALSE
  )

Using fmt_passthrough with escape = FALSE should help here.

It should write this table

\begin{longtable}{rl}
\toprule
Thing & Citation \\ 
\midrule\addlinespace[2.5pt]
1234 & \QuartoMarkdownBase64{QGVxLW1hdGg=} \\ 
5678 & \QuartoMarkdownBase64{QExvdmVsYWNlMTg0Mg==} \\ 
\bottomrule
\end{longtable}

Hope it helps