rstudio / rmarkdown

Dynamic Documents for R
https://rmarkdown.rstudio.com
GNU General Public License v3.0
2.88k stars 977 forks source link

`citation_package` does not work when `keep_tex` is false when rendering plain Markdown files #2113

Closed andrewheiss closed 3 years ago

andrewheiss commented 3 years ago

I'm not sure if this is expected behavior, but I can't find any documentation about it. When using rmarkdown::render() on a vanilla .md file, if a bibliography is specified in the YAML, the citation_package option does not get used and the generated pandoc command includes --citeproc instead of --biblatex or --natbib. However, if keep_tex is set to true, citation_package does get used and the specificed citation package shows up in the pandoc command. This is not the case when rendering R Markdown files.

Thus the only way to use citation_package when working with vanilla Markdown is to also use keep_tex: true:

File type keep_tex Option in pandoc command
.md false --citeproc
.md true --biblatex
.Rmd false --biblatex
.Rmd true --biblatex

Here's a reproducible example:

references.bib (a placeholder bibtex file)

@book{smith2021,
    author = {Jill Smith},
    publisher = {ABC Publishing},
    title = {Some neat title},
    year = {2021}}

testing.md

---
title: "Example"
output: 
  pdf_document: 
    citation_package: biblatex
    keep_tex: false
bibliography: references.bib
---

Testing [@smith2021].

Run rmarkdown::render("testing.md") (or press ⌘⇧K/the knit button) and the generated pandoc command will contain --citeproc and not contain --biblatex:

rmarkdown::render('testing.md')
#> /Applications/RStudio.app/Contents/MacOS/pandoc/pandoc +RTS -K512m -RTS testing.md --to latex 
#> --from markdown+autolink_bare_uris+tex_math_single_backslash --output testing.pdf --lua-filter 
#> /Library/Frameworks/R.framework/Versions/4.0/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua 
#> --lua-filter /Library/Frameworks/R.framework/Versions/4.0/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua
#> --self-contained --highlight-style tango --pdf-engine pdflatex 
#> --variable graphics --variable 'geometry:margin=1in' --citeproc 

But if I switch keep_tex to true and re-render, the generated pandoc will contain --biblatex as expected:

testing.md

---
title: "Example"
output: 
  pdf_document: 
    citation_package: biblatex
    keep_tex: true
bibliography: references.bib
---

Testing [@smith2021].
rmarkdown::render("testing.md")
#> /Applications/RStudio.app/Contents/MacOS/pandoc/pandoc +RTS -K512m -RTS testing.md --to latex 
#> --from markdown+autolink_bare_uris+tex_math_single_backslash --output testing.tex --lua-filter 
#> /Library/Frameworks/R.framework/Versions/4.0/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua 
#> --lua-filter /Library/Frameworks/R.framework/Versions/4.0/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua 
#> --self-contained --highlight-style tango --pdf-engine pdflatex --biblatex 
#> --variable graphics --variable 'geometry:margin=1in' 

If I rename testing.md to testing.Rmd and run rmarkdown::render('testing.Rmd'), --biblatex appears in the pandoc command both when keep_tex is true or false, as expected. --biblatex seems to only unexpectedly disappear when rendering a .md file to PDF with keep_tex set to false.


Here's my session info:

xfun::session_info('rmarkdown')
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
#> 
#> Package version:
#>   base64enc_0.1.3   digest_0.6.27     evaluate_0.14     glue_1.4.2       
#>   graphics_4.0.3    grDevices_4.0.3   highr_0.8         htmltools_0.5.1.1
#>   jsonlite_1.7.2    knitr_1.31        magrittr_2.0.1    markdown_1.1     
#>   methods_4.0.3     mime_0.10         rlang_0.4.10      rmarkdown_2.7.10 
#>   stats_4.0.3       stringi_1.5.3     stringr_1.4.0     tinytex_0.31     
#>   tools_4.0.3       utils_4.0.3       xfun_0.22         yaml_2.2.1       
#> 
#> Pandoc version: 2.11.4

Checklist

When filing a bug report, please check the boxes below to confirm that you have provided us with the information we need. Have you:

andrewheiss commented 3 years ago

For fun I also checked if this is the case when using a different PDF/LaTeX output like bookdown::pdf_document2, and the same unexpected behavior happens.

This as a .md file produces a pandoc command with --citeproc, since keep_tex is false:

---
title: "Example"
output: 
  bookdown::pdf_document2: 
    citation_package: biblatex
    keep_tex: false
bibliography: references.bib
---

Testing [@smith2021].

while this results in a pandoc command with --biblatex, since keep_tex is true:

---
title: "Example"
output: 
  bookdown::pdf_document2: 
    citation_package: biblatex
    keep_tex: true
bibliography: references.bib
---

Testing [@smith2021].
cderv commented 3 years ago

Thanks for the report.

it seems indeed to be a bug or just an unsupported case yet - I believe we are not expected so much that .md file are used with these special R Markdown header (output field in YAML in a R Markdown thing) instead of usual .Rmd file.

This unhandled case happens here: https://github.com/rstudio/rmarkdown/blob/eb55b2e57b4540d593b6799651c76f41f4cd1a6f/R/render.R#L967-L969

That is why when you set keep_tex, you get the expected result.

I'll look closer.

Can I ask why you are using .md file and not .Rmd file ? Because no code chunks inside your document ?

Thanks

cderv commented 3 years ago

Ok I confirm this is not supported yet - this is currently an unexpected case. Here is why:

The test for LaTeX output is currently made by knitr::is_latex_output() as shown above. This will look for knitr::opts$get("rmarkdown.pandoc.to").

Issue is that this value is set only if knitr is used. Which is not in the case of a .md file: https://github.com/rstudio/rmarkdown/blob/eb55b2e57b4540d593b6799651c76f41f4cd1a6f/R/render.R#L347-L348 This is false so anything knitr related won't be run as the below https://github.com/rstudio/rmarkdown/blob/eb55b2e57b4540d593b6799651c76f41f4cd1a6f/R/render.R#L614-L618

Hence the issue for now.

So we would need another mechanism to test the output format at this stage. This is easy as we could retrieve the output format from output_formats$pandoc$to.

I am still curious about the .md vs .Rmd choice when using rmarkdown. Thanks.

andrewheiss commented 3 years ago

The .md vs .Rmd choice was partially accidental. I've been using a Makefile to generate the pandoc command for regular Markdown files for years (see https://github.com/andrewheiss/portable-pandoc-magic/blob/master/Makefile#L212) for instance), but the overhead of installing make + a full LaTeX installation + other the supporting files has made it harder for regular .md files to be portable. I have coauthors and students who don't normally use make and are hesitant to try it for Markdown.

So I've instead had them use .Rmd even in instances where they don't use any R chunks at all, since people only need to install R/RStudio and install appropriate packages. Everything is easily compilable with the knit button and there's no need for Make or any fancier overhead. They basically use RStudio as a glorified Markdown editor.

Out of curiosity, I wanted to see what would happen if I knitted with a .md file instead of .Rmd if a file contains no R code. One student I was working with thought it was odd to use .Rmd without R code, and another student uses a macOS Markdown editor that doesn't natively open .Rmd files, so they rename the file to .md, edit it there, and then rename it back to .Rmd to knit in RStudio. It's a super roundabout process.

When I tried, I was surprised that pretty much everything worked! The options in output: pdf_document and output: html_document etc YAML sections get parsed and used.

So, for instance, this works great with a plain R-free .md file and all the options are passed to the pandoc command:

---
title: Some title
output: 
  bookdown::pdf_document2: 
    template: custom-template.tex
    citation_package: biblatex
    latex_engine: xelatex
    toc: false
    keep_tex: true  # Must be true bc of strange rmarkdown behavior
    pandoc_args: ["--top-level-division=section",
                  "--shift-heading-level-by=0",
                  "-V", "bibstyle-chicago-authordate",
                  "-V", "chapterstyle=hikma-article"]
    md_extensions: "+raw_tex+smart-autolink_bare_uris"
bibliography: bibliography.bib
mainfont: Lora
sansfont: IBM Plex Sans
fontsize: 12pt
---

This is neat because on its own, pandoc does parse YAML metadata, but only for passing arguments to the the template, not for setting other options like template or latex_engine or passing additional command line flags like --top-level-division or --shift-heading-level-by. Those have to be built into the full pandoc command (which is what both rmarkdown::render() and what my Makefile each build)

So essentially, because rmarkdown::render() (unexpectedly!) works with plain .md files, users can include build-specific YAML options in the metadata and create and build .md documents that have no R chunks in them. It's an unexpected workaround to pandoc's lack of native support for YAML-based command line arguments.

cderv commented 3 years ago

Thanks a lot for the detailled answer. This is really interesting and it makes total sense. With RStudio 1.4 and its Markdown visual editor, there is everything to make work for Markdown file.

I have fix this part of the code so it should now work with .md file as expected. There may be other hole somewhere but it should mainly work. knitr is in used for .Rmd to .md but all the rest is handled by rmarkdown, so yes it should work with .md file.

I completely understand that rmarkdown is used as a tool to organize and run Pandoc in a worklow.

One student I was working with thought it was odd to use .Rmd without R code

About this, there is indeed not explicit R code but implicitly using

output: 
  bookdown::pdf_document2

is R code because bookdown::pdf_document2 is interpreted as a R function. But I completely understand the feeling. There is not code chunk and this is just metadata.

So yes I believe R Markdown could and should be used this way too.

Please share any odd behavior you and your students may encounter in the future.

andrewheiss commented 3 years ago

Ah good catch with bookdown::pdf_document2 - I accidentally copied that from a document I'd been using to test the unexpected keep_tex weirdness :)

Thanks for the fix! This will make a bunch of my students happy

github-actions[bot] commented 3 years ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.