ropensci-archive / reproducibility-guide

:no_entry: ARCHIVED :no_entry:
http://ropensci.github.io/reproducibility-guide/
117 stars 53 forks source link

Workflow Construction #1

Closed iamciera closed 10 years ago

iamciera commented 10 years ago

It would be nice to have the workflows easily generated so in the future people can add their own, ideally with all of them matching in format and color coding. I can generate them fast in Illustrator, but this would not help long term.

Another will be collecting the workflows.

jhollist commented 10 years ago

@iamciera Nice start on the workflows!

Couple of thoughts on workflows:

  1. Not sure if there is a standard format, but it might be nice to have markup/language (i.e. .md, .tex, etc.) formatted differently that tools (knitr, pandoc ...). Perhaps the rounded rectangles you currently have for the language and ovals for tools. Along that same line of thinking, I would separate knitr Something like:

workflowwireframe_jwh

  1. Do we want to include other output formats from pandoc, such as .docx, .tex, etc.?
  2. As an alternative to illustrator, I have used Inkscape. Not sure what your concerns were about long term, but it is free!
eduardszoecs commented 10 years ago

I uploaded an example workflow for R + Latex, here. With graphical outline, README, R and LaTeX files.

I just did the outline quickly with Inkscape - so I won't win any design award ;) That's my general workflow that I find useful.

For format and coding, we could also use tkiz example here

benmarwick commented 10 years ago

Here's one of my favourite workflow diagrams:

kieran

context: http://kieranhealy.org/blog/archives/2014/01/23/plain-text/

eduardszoecs commented 10 years ago

@benmarwick Nice, thanks for this!

Looks very interesting an promising!

iamciera commented 10 years ago

Great! I will add everything and think about other ways to present them.

benmarwick commented 10 years ago

Excellent schematics, well done! One question, should the file type for article in the R box of the Healy workflow be rmd rather than CSV/text? And what do the coloured dots signify?

jhollist commented 10 years ago

I was going to bring up the same thing as Ben re: csv/txt vs .Rmd.

Also instead of calling out a single authoring tool (Mu/Mou) perhaps we could be more agnostic and outline a few of the tools (Mu/Mou, RStudio, Text Editors, Authorea etc.) in the Intro to Tools section?

Cheers, Jeff

jhollist commented 10 years ago

Another possible workflow could simply be RStudio. I have heard (haven't played with it yet myself) that the latest preview version of RStudio includes export of .Rmd to .pdf, .docx, etc. I believe they have rolled Pandoc in. Might not make a compelling figure, but could probably become a common workflow for many.

eduardszoecs commented 10 years ago

Maybe we could/should add some discussion of advantages/disadvantages of the workflows? E.g. Having Code and Text in one place may be good for short texts or presentations, but for longer research papers I prefer to have Code and Text separated (as I usually first develop (large) code and then write the paper).

jhollist commented 10 years ago

+1 on @EDiLD suggestion.

For each workflow we could have the illustration, short description, advantages/disadvantages, and examples to be downloaded and run.

iamciera commented 10 years ago

I agree. I was thinking the same thing. The workflows need a bit of description with them. The colored dots was a way to try to incorporate folder structure, which is keyed in EDiLD's example. Maybe each workflow should have it's own page? There are a few more workflows people are giving me, but I think I will work on them later. I am going to deal with the page structure so I can add children pages to the workflow section.

jhollist commented 10 years ago

+1 on the separate pages

benmarwick commented 10 years ago

@jhollist I can confirm the latest version of RStudio works like that, effortless rmd to pdf/html/docx mostly thanks to the new version of the rmarkdown package

Also, here's another workflow/file structure diagram from Christopher Gandrud's book Reproducible Research with R and RStudio (very cool that the entire book can be reproduced!):

pages from rep-res-parent

@EDiLD your suggestion about externalised code is very useful and we should point to some ways that R enables that (sourceing r script, read_chunk, having child files and so on)

jhollist commented 10 years ago

@benmarwick , I can also confirm. I grabbed the preview version this morning just to try it out. Pretty cool and might be enough to get a few more point-and-clickers on board...

And I'd add to the externalised code list, creating a separate R package for the code. Similar to source of course, but forces a bit more structure on it. I have two papers I am working on now that we are trying this. All analysis in an R package (including figures) and simply loading the package in a code chunk and then calling the appropriate function in the flow of the manuscript.

benmarwick commented 10 years ago

+1 for research-project-as-R-package, an excellent idea that encourages structured documentation of the code, and tests can help ensure the code actually works. Gentleman and Temple Lang make a great case for this here, calling it a compendium.

iamciera commented 10 years ago

Christopher Gandrud's workflow is badass!

For a detailed look at folder structure that follows any of the workflows we use, we can also provide Github repo links that use that structure.

I will make another workflow that is super simple using R Studio, in the description I can specify that this is the easiest. The only reason I see that people wouldn't want to use this is if they want specific styling for the PDF/HTML output. Any other reason you guys can think of?

benmarwick commented 10 years ago

Good plan, I agree about RStudio being the simplest, and probably the most suitable for those getting starting with all this (is that our target audience?). Folks passionately committed to other environmentals (emacs, etc.) would likely have strong views otherwise.

jhollist commented 10 years ago

@iamciera Do you have the preview version of RStudio (released yesterday!)? That has all the fun pandoc integration.

And, just thinking out loud here, but I would bet that you can also control the styling of the output for the .docx and .pdf. Since RStudio is using pandoc, they likely have template .docx or .sty files to control the styling.

One downfall to the RStudio is automation, but if that is a concern for a user, they probably would be using a different workflow to begin with!

In any event, I agree that it is the easiest option and most certainly a good one to include.

iamciera commented 10 years ago

No I did not! I will update now my RStudio now and mess around with it a bit. There are Rstudio people here who will def know if this is possible (the styling), I will ask them at lunch.

@jhollist, you are right though, we should strive to have the simplest workflows because the people who are interested with little automation are most likely making their own workflows anyway. That is something we should keep in mind through out all of this. The people who we really want to reach with this guide is people new to thinking in this way, so the simpler the better, for everything.

jhollist commented 10 years ago

@iamciera +100 on "The people who we really want to reach with this guide is people new to thinking in this way"

jhollist commented 10 years ago

Oh, and let me know what you find out about the styling. I dug around a bit, but didn't see anything obvious.

eduardszoecs commented 10 years ago

Yes, sourcing R-scripts would make the .Rmd more readable and also maintainance of code should be easy, without touching the paper. Why, didn't I though of this???...