Producing native Abstract Syntax Tree from and within an Rmd YAML

jooyoungseo commented 4 years ago

Currently, it does not seem like we have a convenient workaround to produce native Abstract Syntax Tree (AST) within an Rmd document.

For a markdown file, we use the following command in terminal:

pandoc -f markdown -s -t native -o output.native file.md [--lua-filter=lua_file.lua] ...

Of course, we could use the same command for any Rmd files; however, it would be so much convenient if we could had a relevant YAML option like keep_tex and keep_md.

What if we add a keep_native boolean option within an Rmd file YAML that can generate [file_name].native when set to true?

atusy commented 4 years ago

I come up with two work flows, and the both call pandoc twice. I guess the latter is more efficient, and think more consistent to keep_native. What do you think?

Rmd --> md --> html
           \-> native (optional)

Rmd --> md --> native (optional) --> html

BTW, I first thought md_document(variant = 'native') would be enough. However, I changed my mind because

some functions output results conditionally based on output formats.
it is redundant to specify same options twice for a specific output format and md_document

atusy commented 4 years ago

The former sounds easier to implement. The efficiency might be negligible because generating native is a rare case.

jooyoungseo commented 4 years ago

Thank you very much for your comments, @atusy!

I agree with you. Since native option might be rarely used by some developers who need to analyze the AST, it can come at the cost of efficiency.

atusy commented 4 years ago

Your welcome. I drafted a PR #1727. Currently, keep_native works with html_document and pdf_document only.

cderv commented 2 years ago

This is quite an old thread but I am going through the whole backlog as we are getting back with rmarkdown.

I don't quite understand the need behind converting to native in addition to another format. a keep_native option seems quite advanced.

What about using rmarkdown::pandoc_convert() when producing ?

rmarkdown::pandoc_convert("test.Rmd", to = "native")

wouldn't that be enough ? (using output if a file is needed)

From my understanding, Lua filter would apply after native representation so there is not need to pass a filter as command flags, and most of the other are often not useful. but passing argument to the command line is possible. I may be mistaken and mislead according to my own usage. (using Pandoc command line this way)

We could definitely improve the pandoc_convert() function if needed, or provide new helpers around pandoc. (which we'll soon have).

I would look more into the use case behind producing AST before deciding what to do exactly in rmarkdown directly.

Rekyt commented 1 year ago

As an Rmarkdown user, my interest in having a keep_native option (or similar) would be to understand better what happens under the hood of pandoc. Especially to develop appropriate Lua filters for Rmarkdown documents. And it could be nice if this native format would apply all the specified Lua filters already.

cderv commented 1 year ago

I agree this would be really great to have that!

We can see about it further for R Markdown, but FWIW such work was done in newer project Quarto (https://quarto.org/) which relies more heavily on Lua filters and offers a native format that works as you described I think.

R Markdown team is also working on Quarto and so we are doing such new features in Quarto. It is also easier to make such feature there. I don't know yet if or when we could add this in rmarkdown directly; but happy to review PR if anyone wants to contribute.

Anyhow, I'll keep the idea of making this easier for rmarkdown. The pandoc R 📦 was also develop to make working with Pandoc directly more easily (https://cderv.github.io/pandoc/).

Maybe we could try a native_document() format to use as an output for rmarkdown::render(), but maybe a keep_native is easier. It is just that native is not a intermediate output, it is just an internal representation for Pandoc which differ from keep_tex and keep_md where intermediates files are really written to disk

Small note that this works

library(rmarkdown)
pandoc_convert(render("test.Rmd", run_pandoc = FALSE), to = "native")

which has the benefit to work really work on the knitted file and not the input file only as mentioned in my previous post.

rstudio / rmarkdown

Producing native Abstract Syntax Tree from and within an Rmd YAML #1726