quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.96k stars 326 forks source link

Rendering to GFM does not use linked literature citations #9601

Open martinscripts opened 6 months ago

martinscripts commented 6 months ago

Bug description

When rendering a quarto document to Github flavoured markdown (GFM), the output does not use this linked GFM citation syntax:

ants are great, see [[1]](#1)

## Table of references

<a id="1">[1]</a>  A. Ant, "A Book About Ants", 2021

instead it uses

ants are great, see A. Ant (2021).

## Table of references

<div id="refs" class="references csl-bib-body" entry-spacing="0">

<div id="ref-ants" class="csl-entry">

<span class="csl-left-margin">\[1\]
</span><span class="csl-right-inline">A. Ant, “A Book About Ants” 2021.</span>

</div>

I see the following issues with it:

Am I missing a setting? Or is it the expected behaviour? Or is this citation linking only github pages specific and not a gfm feature?

Steps to reproduce

---
title: "Ants"
author: 
    name: "Anty McAnt"
bibliography: ./literature.bib
format:
    gfm:
        html-math-method: plain
        number-sections: false
        toc: false
---

ants are great, see @ant2021.

and render to gfm.

Expected behavior

ants are great, see [[1]](#1)

## Table of references

<a id="1">[1]</a>  A. Ant, "A Book About Ants", 2021

Actual behavior

No response

Your environment

IDE: VSCode 1.88.1 OS: Win10

Quarto check output

Quarto 1.5.31 [>] Checking versions of quarto binary dependencies... Pandoc version 3.1.13: OK Dart Sass version 1.70.0: OK Deno version 1.41.0: OK Typst version 0.11.0: OK [>] Checking versions of quarto dependencies......OK [>] Checking Quarto installation......OK Version: 1.5.31 Path: C:\Users\mfranke\AppData\Local\Programs\Quarto\bin CodePage: 1252

[>] Checking tools....................OK TinyTeX: v2024.01 Chromium: (not installed)

[>] Checking LaTeX....................OK Using: TinyTex Path: C:\Users\mfranke\AppData\Roaming\TinyTeX\bin\windows\ Version: 2023

[>] Checking basic markdown render....OK

[>] Checking Python 3 installation....OK Version: 3.10.11 Path: d:/User/mfranke/documents/projects/colib/.venv/Scripts/python.exe Jupyter: 5.7.1 Kernels: python3, .venv, py_venv, venv

() Checking Jupyter engine render....Traceback (most recent call last): File "C:\Users\mfranke\AppData\Local\Programs\Quarto\share\jupyter\jupyter.py", line 21, in from notebook import notebook_execute, RestartKernel File "C:\Users\mfranke\AppData\Local\Programs\Quarto\share\jupyter\notebook.py", line 15, in from yaml import safe_load as parse_string ModuleNotFoundError: No module named 'yaml' There is an unactivated Python environment in .venv. Did you forget to activate it?

[>] Checking Jupyter engine render....OK

cscheid commented 6 months ago

Yeah, I agree that the output here isn't great.

Before we go any further, though, can you also share your literature.bib file? I'm using a different file and getting a slightly different output in the bibliography. Here's my full reproducible example:

ants.qmd

---
title: "Ants"
author: 
    name: "Anty McAnt"
bibliography: ./references.bib
format:
    gfm:
        html-math-method: plain
        number-sections: false
        toc: false
---

knitr is great, see @xie2015.

references.bib

@Book{xie2015,
  title = {Dynamic Documents with {R} and knitr},
  author = {Yihui Xie},
  publisher = {Chapman and Hall/CRC},
  address = {Boca Raton, Florida},
  year = {2015},
  edition = {2nd},
  note = {ISBN 978-1498716963},
  url = {https://yihui.name/knitr/},
}
...

output

# Ants
Anty McAnt

knitr is great, see Xie (2015).

<div id="refs" class="references csl-bib-body hanging-indent"
entry-spacing="0">

<div id="ref-xie2015" class="csl-entry">

Xie, Yihui. 2015. *Dynamic Documents with R and Knitr*. 2nd ed. Boca
Raton, Florida: Chapman; Hall/CRC. <https://yihui.name/knitr/>.

</div>

</div>

Note the differences in the way the classes are generated. I'm running the latest version on main, which is close to what you're running as well, so I'm a bit confused. It seems that your bibliography is rendered slightly differently.

Regarding the broader problems:

I'm not sure how much of this is something we can fix in Quarto vs something that needs improving in Pandoc. Specifically, Quarto is not doing much wrt to processing the document for citations in the case of GFM, and is letting Pandoc's citeproc take over bibliography generation and citations.

The egregious problem here is that there's no link from A. Ant (2021). to the entry with id ref-ants. But the creation of that link is not something that Quarto currently does. Running the document directly through Pandoc yields the same results:

$ pandoc -f markdown -t commonmark ants.qmd --citeproc
knitr is great, see Xie (2015).

<div id="refs" class="references csl-bib-body hanging-indent"
entry-spacing="0">

<div id="ref-xie2015" class="csl-entry">

Xie, Yihui. 2015. *Dynamic Documents with R and Knitr*. 2nd ed. Boca
Raton, Florida: Chapman; Hall/CRC. <https://yihui.name/knitr/>.

</div>

</div>

This would appear to be something that would ideally be fixed upstream of us.

cscheid commented 6 months ago

I'll note, in passing, that we have crossref: ref-hyperlink as a crossref option, with default true, and it has no effect here.

cscheid commented 6 months ago

Another thought that won't work: we could try to add a Cite entry to our render_gfm_fixups filter, where we walk the citations and attempt to wrap them in links. The problem here is that Pandoc citations can be complex. For example, consider:

This could work

see @xie2015.

This would not work

Blah Blah [@wickham2015; @knuth1984].

So I really think we need a PR on https://github.com/jgm/pandoc for this.

Counterpoint to ☝️

We could add a render_gfm_fixups filter that would edit the rich text entry in a Cite itself, rather than wrapping the entire Cite element. This way, we could detect individual citation entries inside the content. That might be a bit brittle, but could work.

cscheid commented 6 months ago

The render_gfm_fixups path isn't going to look good. I tried this:

    Cite = function(citeEl)
      if not refHyperlink() then
        return
      end

      -- fixup lack of hyperlinks in Pandoc's citeproc for GitHub markdown
      -- https://github.com/quarto-dev/quarto-cli/issues/9601

      -- we first attempted to change this by walking the citeEl.content
      -- but it seems that Pandoc just doesn't use this when producing
      -- the final markdown. So we'll now try to change the citeEl.citations
      -- prefixes and suffixes

      -- first, find all labels that need fixing up
      local labels_to_fixup = {}
      for i, cite in ipairs (citeEl.citations) do
        local label = cite.id
        label = pandoc.text.lower(label:sub(1, 1)) .. label:sub(2)
        local entry = crossref.index.entries[label]
        if entry ~= nil then
          -- if entry was found in our own crossref index, it's not going
          -- to be processed by Pandoc's citeproc. TODO what should we do?
        else
          if cite.prefix == nil then
            cite.prefix = pandoc.Inlines({
              pandoc.RawInline("markdown", " <a href=\"#ref-" .. label .. "\">")
            })
          else
            cite.prefix:insert(pandoc.RawInline("markdown", " <a href=\"#ref-" .. label .. "\">"))
          end
          if cite.suffix == nil then
            cite.suffix = pandoc.Inlines({
              pandoc.RawInline("markdown", "</a>")
            })
          else
            cite.suffix:insert(1, pandoc.RawInline("markdown", "</a>"))
          end
        end
      end
      return citeEl

The problem is that the presence of these artificial RawInline elements in the citation prefix and suffix cause Pandoc to render the citation differently (using parentheses, etc). In addition, I only now realize that Pandoc is emitting divs without anchors, and so hyperlinks will also not work.

This really does need to be fixed at https://github.com/jgm/pandoc.

martinscripts commented 6 months ago

@cscheid thanks for your investigations.

I forgot a line in my MWE, I was using this ieee.csl.

The correct MWE is:

---
title: "Ants"
author: 
    name: "Anty McAnt"
bibliography: ./literature.bib
csl: ./ieee.csl
format:
    gfm:
        html-math-method: plain
        number-sections: false
        toc: false
---

ants are great, see @ant2021.

When using the CSL, it produces the span elements

cscheid commented 6 months ago

Thanks, this confirms my suspicion. Because the behavior you're looking to change is dictated by the contents of the CSL file, there's nothing that we can do in Quarto about it (CSL rendering is controlled by Pandoc).

martinscripts commented 6 months ago

despite the output with span elements, the other issue remains relevant, right? I mean that the use of citations in the text could be done by [displayed_literature_id](#literature_id).

displayed_literature_id could be "[1]" as in my case with the CLS or "McAunt 2021" or similar without the CLS.