vincentarelbundock / tinytable

Simple and Customizable Tables in `R`
https://vincentarelbundock.github.io/tinytable
GNU General Public License v3.0
211 stars 18 forks source link

Citations and cross-references in Quarto #215

Closed andrewheiss closed 7 months ago

andrewheiss commented 7 months ago

When using Markdown inside tables with Quarto, Quarto ignores the content and will not parse it. That's ordinarily okay—using format_tt(..., markdown = TRUE) will format most things just fine.

It gets tricky with syntax that Quarto should parse, like cross references and citations. For instance, take this:

---
title: "Reference stuff"
references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Molecular structure of nucleic acids
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

```{r}
library(tinytable)

x <- data.frame(Thing = 1234, Citation = "@Lovelace1842")
tt(x)

It emits this HTML:

```html
<table>
  <thead>
    <tr>
      <th scope="col" class="tinytable_css_9qmquh7a5tfly3l7oiyn">Thing</th>
      <th scope="col" class="tinytable_css_9qmquh7a5tfly3l7oiyn">Citation</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>1234</td>
      <td>@Lovelace1842</td>
    </tr>
  </tbody>
</table>

The @Lovelace1842 citation key isn't parsed and appears in the table:

image

Quarto has the ability to treat specific text as Markdown, though, if you wrap it in an element with a data-qmd attribute set. A td element containing this should render as an actual citation:

<td> <span data-qmd="@Lovelace1842"></span> </td>

This is an issue with all table-making packages (see here for a discussion about it at Quarto https://github.com/quarto-dev/quarto-cli/issues/3340). {gt} has it fixed and there's an open issue at {knitr} for it, with more details too: https://github.com/yihui/knitr/issues/2289


I don't know the best way to handle this with {tinytable} though. format_tt(..., markdown = TRUE) uses the {markdown} package to convert to HTML rather than Quarto, and that's great.


One additional complication is that this also doesn't work in LaTeX, and neither does {gt}, but knitr::kable() does somehow (see https://github.com/quarto-dev/quarto-cli/issues/3340#issuecomment-1787369780).

vincentarelbundock commented 7 months ago

Love it when power users start playing packages. That's when all the real good stuff comes out :)

This one will require some thought, as I've never really encountered or thought about this issue. I'll read up on it when I find some time and will circle back.

But now I need to drop my oldest off at the airport for her first school trip abroad (signed: scared/proud father).

vincentarelbundock commented 7 months ago

What would be really useful to have is a function that makes it work using the very very general fn argument available in format_tt().

In this example, I tried to make a conditional modification of the cell:

  1. Insert in the special Quarto <span> you pointed to in HTML docs.
  2. Use regex to convert to natbib syntax in LaTeX.

It didn't quite work, but maybe it's a step in the right direction...

library(tinytable)

fn <- function(z) {
    if (isTRUE(knitr::pandoc_to() == "html")) {
        out <- sprintf("<span data-qmd='%s'></span>", z)
    } else if (isTRUE(knitr::pandoc_to() == "latex")) {
        out <- sub("@(\\w+)\\b", "\\\\cite\\{\\1\\}", z)
    }
    return(out)
}

x <- data.frame(Thing = 1234, Citation = "@Lovelace1842")
tt(x) |> format_tt(j = 2, fn = fn)
vincentarelbundock commented 7 months ago

OK, I figured this out.

Proof of concept with bad user interface:

https://vincentarelbundock.github.io/tinytable/vignettes/tinytable.html#quarto-data-processing

Background on two issues:

  1. Quarto normally does a ton of pre-processing on all tables. By default, tinytable disables that pre-processing, because it breaks a bunch of features.
  2. Even when the pre-processing gets done, Quarto still requires users to specifically mark a cell with the special span: <span data-qmd="@Lovelace1842"></span>

On Github, I added a new global option to re-enable Quarto pre-processing. I also added an example to the vignette with a reference.

The user experience is terrible, but it works.

I'm not sure how to make the experience better. We can't enable pre-processing all the time, because there are tons of conflicts with nice features and styles.

What does <span data-qmd> do, exactly? Will this always interpret markdown?

Should we insert that span automatically when the global option is set and a user calls format_tt(markdown = TRUE)? Is this span a complete substitute to the markdown package?

andrewheiss commented 7 months ago

Oh cool, yeah, this is roughly what {gt} does too. You have to disable quarto processing in an option—see quarto.disable_processing here:

library(gt)
k <- data.frame(Thing = "x^2^", Citation = "@Lovelace1842")

k |>
  gt() |>
  tab_options(
    quarto.disable_processing = TRUE
  )

I'm fairly certain that when rendering to HTML, Quarto assumes that the content of chunks that create tables (like {gt}, {kableExtra}, {tinytable} and friends) is HTML. If that content has markdown content that needs to be parsed at the time of the whole document (like citation keys and cross references), it won't because Quarto assumes that all the formatting has been done (like with the markdown package for {tinytable}, or whatever {gt} uses to create its HTML), and the table is ready to go. The special span tells Quarto to do some further processing on those cells (e.g. parse the citation key).

There's also a similar feature for LaTeX—there's a \QuartoMarkdownBase64{} command. See https://github.com/quarto-dev/quarto-cli/issues/9342 for how it works, and for a bug where it works for cross references but not for citations (because of the complexity of Quarto's Lua filter ordering)

vincentarelbundock commented 7 months ago

Thanks for the info. I've added a quarto argument to format_tt(). This means you can now do things like:

Mark a single cell for Quarto processing:

k <- data.frame(Thing = "qwerty", Citation = "@Lovelace1842")

tt(k) |> format_tt(i = 1, j = 2, quarto = TRUE)

Apply Quarto data processing to all tables using a theme and a global option:

theme_quarto <- function(x) format_tt(x, quarto = TRUE)
options(tinytable_tt_theme = theme_quarto)

tt(k)
vincentarelbundock commented 7 months ago

Closing this now, but feel free to open separate issues if you run into issues that you believe can be fixed on tinytable's end.

giabaio commented 7 months ago

Sorry to jump back into this, but is the fix given by the extra quarto option to work on HTML output only? Or is it supposed to do on pdf output too? If I run the example above with the supposed tinytable fix and try to output to pdf I get a bunch of (Lua-related) errors...

---
title: "Reference stuff"
references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Molecular structure of nucleic acids
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

```{r}
library(tinytable)
theme_quarto <- function(x) format_tt(x, quarto = TRUE)
options(tinytable_tt_theme = theme_quarto)

k <- data.frame(Thing = "qwerty", Citation = "@Lovelace1842")

tt(k) |> format_tt(quarto = TRUE)

options(tinytable_tt_theme=NULL)


Everything works OK when output to html...
vincentarelbundock commented 7 months ago

@giabaio are you using the development version from GitHub? If so, what specific errors are you getting?

giabaio commented 7 months ago

I am on the Github version. Here's the error message

Error running filter /home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/resources/filters/main.lua:
...arto-cli/src/resources//pandoc/datadir/lpegshortcode.lua:289: invalid UTF-8 code
stack traceback:
    ...arto-cli/src/resources//pandoc/datadir/lpegshortcode.lua:289: in upvalue 'escape_unicode'
    ...arto-cli/src/resources//pandoc/datadir/lpegshortcode.lua:307: in function 'lpegshortcode.wrap_lpeg_match'
    (...tail calls...)
    ...qmd/quarto-cli/src/resources//pandoc/datadir/readqmd.lua:129: in function 'readqmd.readqmd'
    ...qmd/quarto-cli/src/resources/filters/./common/pandoc.lua:216: in function 'string_to_quarto_ast_blocks'
    ...i/src/resources/filters/./normalize/extractquartodom.lua:83: in function <...i/src/resources/filters/./normalize/extractquartodom.lua:70>
    [C]: in ?
    [C]: in method 'walk'
    ...d/quarto-cli/src/resources/filters/./ast/customnodes.lua:76: in function <...d/quarto-cli/src/resources/filters/./ast/customnodes.lua:65>
    (...tail calls...)
    .../quarto-cli/src/resources/filters/./ast/runemulation.lua:82: in local 'callback'
    .../quarto-cli/src/resources/filters/./ast/runemulation.lua:100: in upvalue 'run_emulated_filter_chain'
    .../quarto-cli/src/resources/filters/./ast/runemulation.lua:136: in function <.../quarto-cli/src/resources/filters/./ast/runemulation.lua:133>
stack traceback:
    ...d/quarto-cli/src/resources/filters/./ast/customnodes.lua:76: in function <...d/quarto-cli/src/resources/filters/./ast/customnodes.lua:65>
    (...tail calls...)
    .../quarto-cli/src/resources/filters/./ast/runemulation.lua:82: in local 'callback'
    .../quarto-cli/src/resources/filters/./ast/runemulation.lua:100: in upvalue 'run_emulated_filter_chain'
    .../quarto-cli/src/resources/filters/./ast/runemulation.lua:136: in function <.../quarto-cli/src/resources/filters/./ast/runemulation.lua:133>
ERROR: Error
    at renderFiles (file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/command/render/render-files.ts:350:23)
    at eventLoopTick (ext:core/01_core.js:153:7)
    at async render (file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/command/render/render-shared.ts:102:18)
    at async Command.actionHandler (file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/command/render/cmd.ts:248:26)
    at async Command.execute (file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/vendor/deno.land/x/cliffy@v1.0.0-rc.3/command/command.ts:1948:7)
    at async Command.parseCommand (file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/vendor/deno.land/x/cliffy@v1.0.0-rc.3/command/command.ts:1780:14)
    at async quarto (file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/quarto.ts:156:3)
    at async file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/quarto.ts:170:5
    at async mainRunner (file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/core/main.ts:35:5)
    at async file:///home/gianluca/Dropbox/Rstuff/Packages/qmd/quarto-cli/src/quarto.ts:160:3
vincentarelbundock commented 7 months ago

Weird. I don't get the same error. Can you make sure you are also running the latest Quarto? Maybe even try prerelase if 1.4 doesn't work.

giabaio commented 7 months ago

I was on a fairly recent commit on quarto-cli; just updated to the latest, but I still get the same error... I am investigating further too...

andrewheiss commented 7 months ago

With Quarto 1.5.29 on macOS I'm getting the same error:

---
title: "Reference stuff"
references:
- type: article-journal
  id: Lovelace1842
  author:
  - family: Lovelace
    given: Augusta Ada
  issued:
    date-parts:
    - - 1842
  title: >-
    Sketch of the analytical engine invented by Charles Babbage, by LF Menabrea, 
    officer of the military engineers, with notes upon the memoir by the translator
  title-short: Molecular structure of nucleic acids
  container-title: Taylor’s Scientific Memoirs
  volume: 3
  page: 666-731
  language: en-GB
---

```{r}
library(tinytable)

x <- data.frame(Thing = 1234, Citation = "@Lovelace1842")
tt(x) |> format_tt(quarto = TRUE)

Here's the error:

quarto render testing.qmd --to pdf Error running filter /Applications/quarto/share/filters/main.lua: /Applications/quarto/share/pandoc/datadir/lpegshortcode.lua:289: invalid UTF-8 code stack traceback: /Applications/quarto/share/pandoc/datadir/lpegshortcode.lua:289: in upvalue 'escape_unicode' /Applications/quarto/share/pandoc/datadir/lpegshortcode.lua:307: in function 'lpegshortcode.wrap_lpeg_match' (...tail calls...) /Applications/quarto/share/pandoc/datadir/readqmd.lua:129: in function 'readqmd.readqmd' /Applications/quarto/share/filters/main.lua:3089: in function 'string_to_quarto_ast_blocks' /Applications/quarto/share/filters/main.lua:8502: in function </Applications/quarto/share/filters/main.lua:8489> [C]: in ? [C]: in method 'walk' /Applications/quarto/share/filters/main.lua:535: in function </Applications/quarto/share/filters/main.lua:524> (...tail calls...) /Applications/quarto/share/filters/main.lua:1312: in local 'callback' /Applications/quarto/share/filters/main.lua:1330: in upvalue 'run_emulated_filter_chain' /Applications/quarto/share/filters/main.lua:1366: in function </Applications/quarto/share/filters/main.lua:1363> stack traceback: /Applications/quarto/share/filters/main.lua:535: in function </Applications/quarto/share/filters/main.lua:524> (...tail calls...) /Applications/quarto/share/filters/main.lua:1312: in local 'callback' /Applications/quarto/share/filters/main.lua:1330: in upvalue 'run_emulated_filter_chain' /Applications/quarto/share/filters/main.lua:1366: in function </Applications/quarto/share/filters/main.lua:1363>

Though also, if there wasn't an error, the citation still wouldn't be processed and the @Lovelace1842 citation key would appear in the table, because right now the \QuartoMarkdownBase64{...} wrapper only works with cross reference keys (e.g., @fig-whatever) and not with citations

andrewheiss commented 7 months ago

Wait, the issue might be here:

https://github.com/vincentarelbundock/tinytable/blob/0a999ddddc34d3482a757616ebb1e832388c7e9a/R/format_tt.R#L410

I might be reading the code wrong here and it might already be doing it elsewhere in that file, but the content inside \QuartoMarkdownBase64{...} (or %s in the code now) needs to be base64-encoded

base64enc::base64encode(charToRaw("@Lovelace1842"))
#> QExvdmVsYWNlMTg0Mg==

If you put the citation key in the data.frame as the base64-encoded version, it will render to PDF just fine:

```{r}
library(tinytable)

x <- data.frame(Thing = 1234, Citation = "QExvdmVsYWNlMTg0Mg==")
tt(x) |> format_tt(quarto = TRUE)


Here's the PDF—the `@Lovelace1842` is still there [because of the Quarto issue](https://github.com/quarto-dev/quarto-cli/issues/9342#issuecomment-2050228314), but the file renders at least:

<img width="328" alt="image" src="https://github.com/vincentarelbundock/tinytable/assets/73663/4f3a340a-cae9-4648-ad64-f1491977a602">
giabaio commented 7 months ago

I can replicate this!

vincentarelbundock commented 7 months ago

Aaah thanks so much for the deep dive!

I was convinced this worked on my computer but it doesn't. (Was on the move without computer; sorry!)

I just pushed a new commit on Github which should at least give us compilation, as shown in Andrew's last post.

Thanks both!

giabaio commented 7 months ago

Thank you both! One step closer!... :-)

andrewheiss commented 7 months ago

Oh awesome! I just looking for a base R, no-external-packages method for base64-encoding to eliminate dependencies, but there isn't one. Everyone seems to use one of these:

I was this close to looking up the algorithm and trying to figure it out for a custom function here, but you just added base64enc to Suggests, so that fixes that :)

vincentarelbundock commented 7 months ago

Oh yeah, adding an option package by Simon Urbanek feels like we're maintaining the spirit of the project 😂