Closed BoPeng closed 8 years ago
It should be rather simple, we can simply define an action markdown to collect output from a step, something like
[10]
R:
run some R script
Markdown:
In this analysis, we generate a figure
embed a ${_output}
The markdown
(o report
, or something else) action would append the parsed markdown code to a default file, unless an option filename="another.md"
is used to append to another file. In the latter case, the same pipeline can write to multiple report files, which can be useful from time to time.
One potential problem with this approach is that we have to make sure that the report are in the right order even the steps are not executed in order. I think the problem can be addressed by outputting to step_1.md
etc and assemble a final output once the pipeline is completed. Here we are taking advantage of the logic execution order of SoS steps so using markdown in auxiliary steps should perhaps be disallowed.
Demonstration of ideas presented in the markdown branch with https://github.com/BoPeng/SOS/commit/3d6a932fa47d4312088330b539f5f786fbead8e2
The example is now given at
https://github.com/BoPeng/SOS/blob/markdown/examples/markdown.sos
Note that
report
action simply write whatever script it is given to a file. So it so desired, they can write complete html files as reports.pandoc
action is used to produce report files. However, we intentionally does not require filename for the report (which can be specified though), so we can at the end of the SoS run pipe all the reports to stdout, then users can do something likesos run my.sos | pydoc from_stdin --to 'html' > report.html
The
sos run my.sos | pydoc from_stdin --to 'html' > report.html
idea would not work because all those commands would write a large amount of garbage to stdout. However, having a default output is usable because all steps would conceptually write to the same report file.
Another idea would be to use special syntax to represent outputs. For example, we can
[10]
# this is regular comment
! this is report that will be write to report when the step is executed.
! this is suitable for small amount of text...
run:
run command
report:
this is long piece of report.
! has to be before run
though because those are considered part of the script (unless we handle !
before script.
Prefix wise !
looks ok but we can also do !
, >
, %
stuff... we will basically translate
! report text
to
report('report text')
It is also possible to use multi-line comments such as
[10]
/* This will go to report
verbatim */
but it is more difficult to format because lines do not align well (and markdown requires alignment sometimes).
This is implemented in https://github.com/BoPeng/SOS/commit/38ae938e538be7f2259642dd5c226adeab857d5f . Here I enforce a rule that there has to be a space after !
to make the script look better. Hopefully this is not too troublesome, lines with a single !
is acceptable though.
There are some details as how to make sure files are concatenate in the correct order, how to avoid conflict, how to handle report that are written by external process, but the syntax is there. I hope this could open up some good usage of SoS, namely using SoS to generate reports similar to Rmarkdown. Although users can choose different markdown flavor and use different tools to process markdown, I chose pandoc because this is the one Rmarkdown uses and probably have the biggest user base.
Great! I guess we can close #77 because this branch does more than I proposed. I have a feeling that kintr
+ rstudio is a hallmark for being a "modern" R user. Similarly SoS is for modern bioinformatics and generating dynamic report is definitely a "modern" feature.
Is pandoc
an SoS action now?
Yes, but pandoc needs to be installed before SoS can be installed. This has been a problem on my mac machines although installation is not that difficult.
As demonstrated in https://github.com/BoPeng/SOS/blob/markdown/examples/markdown.sos , the report can be inline with !
lines, or using an action report, which might be easier to use for large text. It is also possible for the report
action to read from files (that are generated by step process), although this has not bee implemented.
Tried to document this feature at https://github.com/BoPeng/SOS/wiki/Documentation#report
We use report
mainly for extracting reports from the SoS environment. But being Script of Script I can see it is perhaps a big plus if we can add an action for RMarkdown. This is a suggestion from a colleague who currently uses SoS to do all the analysis, but he'll have to write a separate .rmd
file in Rstudio to process the final output. It would be nice if we can support RMarkdown. It could be rather straightfoward, because it is basically wrapping the render
function:
https://github.com/rstudio/rmarkdown
So we can have something like
[write_report]
rmd: html_document, toc = True
``{r, results='asis'}
knitr::kable(mtcars)
``
The challenge is again allowing for action specific parameters
I do not have additional comments and the !
syntax is good. I have suggested an RMarkdown action and I'm now thinking we can just use report
for this purpose if we can provide some switch to tell a certain report
is written in RMarkdown and then SoS will render that report accordingly. Then we do not have to worry about reading output from files as we'll let R take care of it.
Also a question:
report:
In this step we obtained ${_output} ...
What if people write it this way, not just ${output}
? That means the text will be repeated as many times as there are?
I am not sure about the Rmarkdown part because I never run the command Rmarkdown with a file, and do not even know if it take standard input, but that is certainly a possibility. I tent to think it is enough to do sos run script -r result.Rmd
and Rmarkdown result.Rmd
though.
For the second question, yes the report will be written several times for each _input
, and that should be the desired behavior (e.g. a section of report for each output). I do not know how to report all output with input loop though. Our current design runs everything after input:
many times so users would have to do
[10: alias='blah']
input: for_each
run:
[20]
report:
blah.output
I tent to think it is enough to do sos run script -r result.Rmd and Rmarkdown result.Rmd though.
Exactly. But if we add an argument for report
indicating that the text will be processed by markdown, then we can call markdown from within SoS. There is no Rmarkdown
command, but rather, it is something like:
Rscript -e 'render("input.Rmd", html_document(toc = TRUE))'
So if there is a special action for Rmarkdown then the output will not be Rmd, but rather a product from Rmarkdown. It would be nice to run it from within SoS because a lot users do not know how to render outside rstudio!
For the 2nd question, yes I'd do the same as you suggested ...
I see, we just help users call a Rscript to process the output, which sounds easy. I will add a ticket.
Right, and a version without any parameter is just:
Rscript -e "rmarkdown::render('file.Rmd')"
which will knit (if you use knitr inside your Rmd script, where all the dynamic magic happens) and render a PDF file. We can have this first and worry about other parameters to feed to render
Can we add special functions and steps with embedded markdown code that will be processed during the execution of the script? I am not sure how exactly we can do this but I suppose there are enough code out there that handles markdown ... So, eventually our script would be more or less like RMarkdown that, generates html/pdf files with scripts and results. On the quick note, perhaps we can
The idea is that if we output the steps, we can output steps in markdown (we already have this), but embed the markdown, and process them with the results we have.