ropensci / tinkr

Convert (R)Markdown files to XML, edit them, write them back as (R)Markdown
https://docs.ropensci.org/tinkr
GNU General Public License v3.0
57 stars 3 forks source link

Ideas around pandoc #11

Open maelle opened 6 years ago

maelle commented 6 years ago

Ideas from @baptiste on Twitter

"just wondering though – have you considered embedding this step as part of a pandoc filter toolchain, as an alternative to the pandocfilters package? Would allow processing the AST in R with the full power of xml2 etc., but the xml_md return step would not be necessary.

and in fact, i believe that with such a toolchain an alternative to knitr would process chunks once they've been parsed into xml, and update the AST with the results. (i suggested this a while back, but as Yihui said this wasn't possible with pandoc back when knitr started)

... one advantage being a more robust handling of inline code, which is currently extracted by regexs in knitr. Having the full structured AST before running code chunks also allows greater flexibility for pre- and post-processing with custom markup, etc.

this knitr alternative may help with the elusive un-knit function to merge changes done to the output: since chunks and inline code are tagged as such in the input AST, they can be filtered out when diff-ing the output AST and its commented version containing the tracked changes."

maelle commented 6 years ago

R package pandocfilters https://cran.r-project.org/web/packages/pandocfilters/index.html :eyes:

maelle commented 6 years ago

@baptiste I'm not sure I understand how one would go from XML to md? Via pandoc?

Are you interested in helping write a minimal working example?

noamross commented 6 years ago

Pandoc represents its AST in internal structures, which can be manipulated via Haskell or Lua. It makes the tree available to other programs as JSON, so to do this you'd either want to convert the JSON to an R list (as the R package does), convert it to XML, or work with it via jq or some other JS processor.

Looks like there's a Haskell example here: https://github.com/cdupont/R-pandoc

maelle commented 6 years ago

so you wouldn't convert the (R)md to XML first?

noamross commented 6 years ago

It comes down to a couple of things: First, if you want pandoc extensions in you markdown, and second, whether Rmd markup, which has some stuff that isn't exactly markdown, survives the conversion. It seems that Rmd chunk headers and inline code survives with the header and initial r just prepended to the code block when using pandoc, not sure about cmark. After that it's a matter of what format is the most amenable to modifying - JSON, an R List, or XML. XML via xpath is really powerful, but you might prefer the others.

baptiste commented 6 years ago

@maelle i'm keen, but broke my right arm last weekend so typing is a bit of a struggle

baptiste commented 6 years ago

I think a first step would be to make a minimally-interesting dummy Rmd example, and run it through

to have specific ASTs to inspect in the form of R list, json, xml, to fully compare their features.

The next step would be to mimic the knitting step by isolating from the input AST those code bits that need to be run (lots of details to consider here, but knitr has it well figured out).

Last step is merging the output produced with the AST. From there I think pandoc is the most natural tool, as it allows many output formats.

The idea of merging "track-changes" made to an output manuscript would be a variation on this, where in merging changes to the AST one would also look at a diff of the text nodes.

maelle commented 6 years ago

@baptiste I am very sorry that you broke your right arm 😱

I haven't had a chance to look at this yet but hope to do it soon.