quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.81k stars 309 forks source link

QuartoMarkdownBase64 works with cross references but not citation keys in LaTeX/PDF #9342

Open andrewheiss opened 5 months ago

andrewheiss commented 5 months ago

Bug description

This is related to https://github.com/quarto-dev/quarto-cli/issues/3340 and including citation keys in tables (manual ones and ones made with things like {gt})

{gt} recently added fmt_markdown(), which injects <span data-qmd="blah"></span> stuff around citations and cross references in HTML output, so that Quarto correctly processes the citations and cross references.

In https://github.com/quarto-dev/quarto-cli/pull/7451, Quarto added a similar feature for LaTeX output, adding a \QuartoMarkdownBase64{} command for processing/protecting citations and cross references.

\QuartoMarkdownBase64{} works for cross references, but not for citations.

Steps to reproduce

With this document:

---
title: "Citations in tables"
format: 
  html: default
  pdf: 
    keep-tex: true

references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Molecular structure of nucleic acids
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

$$
a^2 + b^2 = c^2
$${#eq-math}

```{r}
library(gt)
tibble::tribble(
  ~Thing, ~Citation,
  1234, "@Lovelace1842",
  5678, "@eq-math"
) |>
  gt() |> 
  fmt_markdown(Citation)

…when rendered to HTML, everything works great thanks to @rich-iannone's updates to `fmt_markdown()`:

<img width="347" alt="image" src="https://github.com/quarto-dev/quarto-cli/assets/73663/4526f41e-4885-4aa5-a89f-bab4b36f28c0">

&nbsp;

When rendered to PDF, though, neither of the citation keys are processed (likely because {gt} doesn't do anything with `\QuartoMarkdownBase64{}` yet?):

<img width="211" alt="image" src="https://github.com/quarto-dev/quarto-cli/assets/73663/17502d37-5960-4b07-92ac-8a108bab99de">

&nbsp;

So I tried making the table manually with LaTeX:
% QExvdmVsYWNlMTg0Mg==} is "@Lovelace1842" in base-64 encoding
% QGVxLW1hdGg= is "@eq-math" in base-64 encoding
\begin{tabular}{cc}
Thing & Citation \\
1234 & \QuartoMarkdownBase64{QExvdmVsYWNlMTg0Mg==} \\
5678 & \QuartoMarkdownBase64{QGVxLW1hdGg=} \\
\end{tabular}

Quarto emits this LaTeX:

```latex
% QExvdmVsYWNlMTg0Mg==} is "@Lovelace1842" in base-64 encoding
% QGVxLW1hdGg= is "@eq-math" in base-64 encoding
\begin{tabular}{cc}
Thing & Citation \\
1234 & @Lovelace1842 \\
5678 & Equation~\ref{eq-math} \\
\end{tabular}

…and this PDF:

image

 

The @Lovelace1842 citation key is unprocessed, while the @eq-math cross reference is converted to the correct LaTeX.

Expected behavior

I was hoping that the @Lovelace1842 citation key in the table would get converted to a bibliographic reference and be processed by citeproc, but it looks like citation processing is happening at a different stage in the Quarto pipeline?

Actual behavior

The @Lovelace1842 citation key is unprocessed, while the @eq-math cross reference is converted to the correct LaTeX.

Your environment

Quarto check output

Quarto 1.5.27
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.5.27
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: (external install)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/andrew/Library/TinyTeX/bin/universal-darwin
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.11.5
      Path: /opt/homebrew/opt/python@3.11/bin/python3.11
      Jupyter: 5.3.0
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.3.2
      Path: /Library/Frameworks/R.framework/Resources
      LibPaths:
        - /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      knitr: 1.45
      rmarkdown: 2.25

[✓] Checking Knitr engine render......OK

❯ quarto check > ~/Desktop/bloop.txt
Quarto 1.5.27
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.5.27
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: (external install)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/andrew/Library/TinyTeX/bin/universal-darwin
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.11.5
      Path: /opt/homebrew/opt/python@3.11/bin/python3.11
      Jupyter: 5.3.0
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.3.2
      Path: /Library/Frameworks/R.framework/Resources
      LibPaths:
        - /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      knitr: 1.45
      rmarkdown: 2.25

[✓] Checking Knitr engine render......OK
cscheid commented 5 months ago

Thanks for the report! Before I dig in: can you make sure your screenshots match your markdown? I'm seeing some repeated 1234s in the markdown but not in the image.

andrewheiss commented 5 months ago

Yep, just fixed it—copied/pasted the wrong thing 😬

cscheid commented 5 months ago

lmao this is "fun".

So here's how this feature works. We detect the presence of \QuartoMarkdown64{...} inside a latex rawblock, and then create placeholder markdown blocks outside of the rawblock. We do this because that markdown content needs to be visible to our filters to do things like crossref processing (which works!) Then, right before we send the final document to Pandoc for writing, we call pandoc.write(block, "latex") on the individual blocks, and inject them back into the latex RawBlock in the correct places.

Unfortunately, citations are not processed by Quarto in Lua filters; they're processed by Pandoc's citeproc functionality. That means that if we call pandoc.write(block, "latex") in the markdown block, that will happen before citeproc.

At the same time, we can't call citeproc more than once, because citation numbering needs to be consistent (and potentially consistently ordered in the document 😬).

This is going to be a bit of a nightmare to fix, I'm sorry to say. I'm going to have to let this simmer in my head for a bit before coming up with a plan.