vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
274 stars 45 forks source link

RMarkdown knitr-like features. #115

Closed BoPeng closed 8 years ago

BoPeng commented 8 years ago

Can we add special functions and steps with embedded markdown code that will be processed during the execution of the script? I am not sure how exactly we can do this but I suppose there are enough code out there that handles markdown ... So, eventually our script would be more or less like RMarkdown that, generates html/pdf files with scripts and results. On the quick note, perhaps we can

[10]
do something

[11]
report/markdown:
    markdown code

The idea is that if we output the steps, we can output steps in markdown (we already have this), but embed the markdown, and process them with the results we have.

BoPeng commented 8 years ago

It should be rather simple, we can simply define an action markdown to collect output from a step, something like

[10]
R:
    run some R script

Markdown:
    In this analysis, we generate a figure
      embed a ${_output}

The markdown (o report, or something else) action would append the parsed markdown code to a default file, unless an option filename="another.md" is used to append to another file. In the latter case, the same pipeline can write to multiple report files, which can be useful from time to time.

One potential problem with this approach is that we have to make sure that the report are in the right order even the steps are not executed in order. I think the problem can be addressed by outputting to step_1.md etc and assemble a final output once the pipeline is completed. Here we are taking advantage of the logic execution order of SoS steps so using markdown in auxiliary steps should perhaps be disallowed.

BoPeng commented 8 years ago

Demonstration of ideas presented in the markdown branch with https://github.com/BoPeng/SOS/commit/3d6a932fa47d4312088330b539f5f786fbead8e2

The example is now given at

https://github.com/BoPeng/SOS/blob/markdown/examples/markdown.sos

Note that

  1. The users does not have to use markdown, because the report action simply write whatever script it is given to a file. So it so desired, they can write complete html files as reports.
  2. The pandoc action is used to produce report files. However, we intentionally does not require filename for the report (which can be specified though), so we can at the end of the SoS run pipe all the reports to stdout, then users can do something like
sos run my.sos  | pydoc from_stdin --to 'html' > report.html
BoPeng commented 8 years ago

The

sos run my.sos  | pydoc from_stdin --to 'html' > report.html

idea would not work because all those commands would write a large amount of garbage to stdout. However, having a default output is usable because all steps would conceptually write to the same report file.

Another idea would be to use special syntax to represent outputs. For example, we can

[10]
# this is regular comment

! this is report that will be write to report when the step is executed.
! this is suitable for small amount of text...
run:
   run command
report:
  this is long piece of report.

! has to be before run though because those are considered part of the script (unless we handle ! before script.

Prefix wise ! looks ok but we can also do !, >, % stuff... we will basically translate

! report text

to

report('report text')

It is also possible to use multi-line comments such as

[10]
/* This will go to report
verbatim */

but it is more difficult to format because lines do not align well (and markdown requires alignment sometimes).

BoPeng commented 8 years ago

This is implemented in https://github.com/BoPeng/SOS/commit/38ae938e538be7f2259642dd5c226adeab857d5f . Here I enforce a rule that there has to be a space after ! to make the script look better. Hopefully this is not too troublesome, lines with a single ! is acceptable though.

There are some details as how to make sure files are concatenate in the correct order, how to avoid conflict, how to handle report that are written by external process, but the syntax is there. I hope this could open up some good usage of SoS, namely using SoS to generate reports similar to Rmarkdown. Although users can choose different markdown flavor and use different tools to process markdown, I chose pandoc because this is the one Rmarkdown uses and probably have the biggest user base.

gaow commented 8 years ago

Great! I guess we can close #77 because this branch does more than I proposed. I have a feeling that kintr + rstudio is a hallmark for being a "modern" R user. Similarly SoS is for modern bioinformatics and generating dynamic report is definitely a "modern" feature.

Is pandoc an SoS action now?

BoPeng commented 8 years ago

Yes, but pandoc needs to be installed before SoS can be installed. This has been a problem on my mac machines although installation is not that difficult.

BoPeng commented 8 years ago

As demonstrated in https://github.com/BoPeng/SOS/blob/markdown/examples/markdown.sos , the report can be inline with ! lines, or using an action report, which might be easier to use for large text. It is also possible for the report action to read from files (that are generated by step process), although this has not bee implemented.

BoPeng commented 8 years ago

Tried to document this feature at https://github.com/BoPeng/SOS/wiki/Documentation#report

gaow commented 8 years ago

We use report mainly for extracting reports from the SoS environment. But being Script of Script I can see it is perhaps a big plus if we can add an action for RMarkdown. This is a suggestion from a colleague who currently uses SoS to do all the analysis, but he'll have to write a separate .rmd file in Rstudio to process the final output. It would be nice if we can support RMarkdown. It could be rather straightfoward, because it is basically wrapping the render function:

https://github.com/rstudio/rmarkdown

So we can have something like

[write_report]
rmd: html_document, toc = True
``{r, results='asis'}
knitr::kable(mtcars)
``

The challenge is again allowing for action specific parameters

gaow commented 8 years ago

I do not have additional comments and the ! syntax is good. I have suggested an RMarkdown action and I'm now thinking we can just use report for this purpose if we can provide some switch to tell a certain report is written in RMarkdown and then SoS will render that report accordingly. Then we do not have to worry about reading output from files as we'll let R take care of it.

Also a question:

report:
  In this step we obtained ${_output} ...

What if people write it this way, not just ${output}? That means the text will be repeated as many times as there are?

BoPeng commented 8 years ago

I am not sure about the Rmarkdown part because I never run the command Rmarkdown with a file, and do not even know if it take standard input, but that is certainly a possibility. I tent to think it is enough to do sos run script -r result.Rmd and Rmarkdown result.Rmd though.

For the second question, yes the report will be written several times for each _input, and that should be the desired behavior (e.g. a section of report for each output). I do not know how to report all output with input loop though. Our current design runs everything after input: many times so users would have to do

[10: alias='blah']
input: for_each
run:

[20]
report:
  blah.output
gaow commented 8 years ago

I tent to think it is enough to do sos run script -r result.Rmd and Rmarkdown result.Rmd though.

Exactly. But if we add an argument for report indicating that the text will be processed by markdown, then we can call markdown from within SoS. There is no Rmarkdown command, but rather, it is something like:

Rscript -e 'render("input.Rmd", html_document(toc = TRUE))'

So if there is a special action for Rmarkdown then the output will not be Rmd, but rather a product from Rmarkdown. It would be nice to run it from within SoS because a lot users do not know how to render outside rstudio!

For the 2nd question, yes I'd do the same as you suggested ...

BoPeng commented 8 years ago

I see, we just help users call a Rscript to process the output, which sounds easy. I will add a ticket.

gaow commented 8 years ago

Right, and a version without any parameter is just:

Rscript -e "rmarkdown::render('file.Rmd')" 

which will knit (if you use knitr inside your Rmd script, where all the dynamic magic happens) and render a PDF file. We can have this first and worry about other parameters to feed to render