rstudio / rmarkdown

Dynamic Documents for R
https://rmarkdown.rstudio.com
GNU General Public License v3.0
2.86k stars 967 forks source link

Converting Rmarkdown to Valid Pandoc Markdown with Fenced Code Attributes #2052

Open xxmissingnoxx opened 3 years ago

xxmissingnoxx commented 3 years ago

By filing an issue to this repo, I promise that

I understand that my issue may be closed if I don't fulfill my promises.

I have posted this issue on Stack Overflow here and have included the text below. Any help that you can provide would be appreciated.

Example Problem

Given an input file of the form:

# Testing

Here is the first block:

```{r}
library(data.table)
library(ggplot2)
library(rstanarm) 
library(data.table)
library(ggplot2)
library(rstanarm)
library(arm)
library(faraway) 

The command:

```{r}
library(rmarkdown)

# You can apparently create custom markdown variants too
# "markdown" is pandoc markdown according to the documentation
render("test.rmd",
       md_document(variant="markdown",
                   md_extensions = c("+fenced_code_attributes"),
                   preserve_yaml = TRUE))

yields:

# Testing

Here is the first block:

``` {.r}
library(data.table)
library(ggplot2)
library(rstanarm) 
library(data.table)
library(ggplot2)
library(rstanarm)
library(arm)
library(faraway) 

I'm wondering if it's possible to preserve the attributes such that you'd get:

Testing

Here is the first block:

library(data.table)
library(ggplot2)
library(rstanarm) 
library(data.table)
library(ggplot2)
library(rstanarm)
library(arm)
library(faraway) 


which I believe is valid pandoc markdown and might even be created by rmarkdown itself internally somewhere.

These resources on fenced_code_attributes seem useful:
* https://pandoc.org/MANUAL.html discusses fenced_code_attributes
* https://rmarkdown.rstudio.com/authoring_pandoc_markdown.html#Verbatim_(code)_blocks discusses fenced_code_attributes

# Why?

I'd like to retain these attributes and easily manipulate them myself using pandoc filters, but can't find a way to keep them. It doesn't seem to be explicitly covered in the rmarkdown documentation either: "These options are mostly useful to HTML output. There are cases in which the attributes may be useful to other output formats, but these cases are relatively rare. The attributes need to be supported by either Pandoc (such as the .numberLines attribute, which works for both HTML and LaTeX output), or a third-party package (usually via a Lua filter, as introduced in Section 4.20)."

# Why not just use pandoc? 

I don't think that rmarkdown is totally compliant with any form of pandoc supported markdown. For example, the block labeled "test" does not appear to comply with any notion of `fenced_code_attributes` I've seen because the word test is not written as `.test`.
yihui commented 3 years ago

I'd like to retain these attributes and easily manipulate them myself using pandoc filters

You can manipulate the .md output using Pandoc filters, but you can't manipulate the .Rmd source document, because R code chunks in .Rmd have its own special syntax that Pandoc doesn't necessarily understands, and these chunks are compiled by knitr to produce Markdown output to be passed to Pandoc (https://bookdown.org/yihui/rmarkdown-cookbook/rmarkdown-process.html).

I don't quite understand why you would want to manipulate the .Rmd source documents. We may be able to help you better if you tell us what exactly you want to do.

xxmissingnoxx commented 3 years ago

Thank you for your response!

I'd like to be able to convert to and from Emacs' org mode format more seamlessly. More specifically, I'd like to be able to access the additional attributes used in Rmd files and convert them to their equivalents in org mode code blocks (ex. whether to evaluate and block name among others). If those attributes are lost when moving from Rmd to md as seen above, then I can't manipulate them via pandoc filters if I move from md to org mode.

yihui commented 3 years ago

Okay I see. As I said, Pandoc doesn't really understands Rmd; only knitr does, so you have to use knitr's Rmd parser. This function may get you started:

parse_rmd = function(file) {
  x = xfun::read_utf8(file)
  on.exit(knitr::knit_code$restore(), add = TRUE)
  res = knitr:::split_file(x, patterns = knitr::all_patterns$md)
  lapply(res, function(el) {
    if (!is.null(label <- el$params$label)) el$src = knitr::knit_code$get(label)
    el
  })
}

You decide what to do with the params elements in the list. After the manipulation, you write the list out to a md file using Pandoc's fenced div syntax.

cderv commented 3 years ago

Regarding parsing Rmd files, there is also the parsermd package: https://rundel.github.io/parsermd/ This is another solution to parse a Rmd file as a syntax tree and manipulate the elements, including chunk option. Maybe it can help on the R side to modify the Rmd document into a suitable md doc for your conversion.

I don't know much about Org mode (https://orgmode.org/) but this is a known "to / from" format for Pandoc. To do Rmd to Org Mode directly, it would indeed be necessary for Pandoc to support R Markdown syntax (knitr chunks + Pandoc's markdown) which it does not and I am not sure it will happen. However, as Pandoc knows how to convert from md to org mode, maybe it could be possible to have a knitr support writing a markdown syntax that allows the conversion by Pandoc as you suggested. This would require to play with the hooks to modify the way outputs are written. It would need some thinking and research to find the equivalent / correct syntax (converting org-mode file to markdown using Pandoc would give some hints) if supported.

I say if supported because I don't really know how your example would like into org mode. Passing it to Pandoc gives a result that looses attributes other than the first

❯ pandoc -f markdown -t org
# Testing

Here is the first block:

``` {.r}
library(data.table)
library(ggplot2)
library(rstanarm)
library(data.table)
library(ggplot2)
library(rstanarm)
library(arm)
library(faraway)

^Z

+begin_src R

library(data.table) library(ggplot2) library(rstanarm)

+end_src

+begin_src R

library(data.table)

+end_src

+begin_src R

library(ggplot2) library(rstanarm) library(arm) library(faraway)

+end_src



As you see above `.test` or `eval = FALSE` is not kept by Pandoc. It is the same output conversion than if this attributes are not present in the `/md` file. Considering this, I am not sure that having a way to keep attributes in markdown output would help for further processing. 

Anyway, it is just some shared thinking, and it would be some work to have a Rmd -> Org file conversion. I hope the Rmd parsing hints will help you in this task.