ropensci / unconf18

http://unconf18.ropensci.org/
44 stars 4 forks source link

Help researchers track results in manuscript back to source code. #46

Open maurolepore opened 6 years ago

maurolepore commented 6 years ago

How do you link a result in your manuscript back to its source code? This is fundamental to reproducible research. It seems basic and straight forward but, in the wild world I live, it is not. Research gets messy quickly: After a few weeks out of touch with a project, wish me luck finding my own stuff; and forget about finding code in a project managed by someone else.

My inelegant solution is this:

ab12 <- "Code which result proves that Earth is not flat."
result <- code

image

image

Is there a tool or better approach? What general recommendations do you have for researchers across a range of willingness to use version control and RStudio projects?

noamross commented 6 years ago

This is a great approach. It is of course related to #42, but potentially applies very different workflows. For projects where I'm not compiling the final output I like to have an outputs folder which has not only images and tables but an Rmd or text file output with all the essential quantitative values that make their way into the manuscript. Usually things can be traced back from the filenames there.

wlandau commented 6 years ago

drake + literate programming may help a bit. Drake's main example's has a data analysis workflow with this R Markdown report at the very end. The active code chunk has calls to loadd(fit) and readd(hist), which serve to

  1. Fetch targets from the cache when the report compiles, and
  2. Tell drake to treat fit and hist as formal dependencies (so drake::make() rebuilds the report.html if there is a change to fit or hist.) Even if you don't care about Make-like build management, you can still see where these data objects fit into the pipeline.

screenshot_20180429_175059

In that sense, using and annotating an artifact are one in the same.

I am curious to know the views of @gmbecker and @duncantl on the original issue. As I understand it, provenance is a major focus of trackr, RCacheSuite, and CodeDepends.

wlandau commented 6 years ago

Edit: as for linking data objects back to the source code, the dependency graph shows the functions that generated fit and hist. That's an important point I forgot to add. The previous graph excluded functions. See below for the full graph.

screenshot_20180429_180514

maurolepore commented 6 years ago

Awesome! I'm learning so much and the unconference hasn't even started! Thank you!

wlandau commented 6 years ago

It's such a fantastic crowd! I wish I could be at unconf to soak up more knowledge in person.

maurolepore commented 6 years ago

Summary: