egouldo commented 6 years ago

Problem

During the course of a computational replication, many sources of error might arise, causing the replication to fail. One critical component in a computational replication is the computing environment -- should, for example, any dependencies, like R packages, be no longer available, anyone wishing to reproduce your analyses in R will be unable to do so. Tools like Docker and The Rocker Project provide completely containerised environments -- including all dependencies -- for reproducing R analyses and projects.

Unfortunately this model of facilitating computational reproducibility across machines and analysts is extremely difficult to implement for the regular R user wishing to time-capsule their work. Specialised knowledge, and a good deal of time, is needed to get docker up and running. Some folk might not even know that Docker exists!

Consequently one of the most common models of open science involves authors submitting data and code to repositories like Dryad, and then providing the link inside their journal article. Whilst this ticks the transparency box of open science, it certainly does not guarantee reproducibility, for the reasons exemplified above.

Proposed solution

The fundamental objective is to create some sort of a time-capsule:

R package - set of commands akin to blogdown for hugo in Rstudio. The user can timecapsule their R project.
Shiny App - This package will be loaded so that the functions are available for implementation in a shinhy app, where the user can simply upload their data + code + etc etc and then hit "TImecapsule my R project", and the app using the package creates a docker container, that the user can then download.

The goal of the package, and the Shiny App, if we get there, is to create a "docker-like" system where the user can:

a) match the environment such that you can at least get the code to run b) run the code, in a make-like manner c) access the computing environment such that you can engage with raw, intermediate, and output objects in the data analysis pipeline of a scientific study to check the validity of the coding implementation of its analyses.

It should make the process of going from code, data, packages, and some set of assemblage instructions --> docker EASY!! The ultimate aim of making this process easy is to increase the generation of more reproducible scientific outputs, such that independent analysts can 1. obtain, and 2. re-run scientific analyses -- and, hopefully, reproduce them!

Thanks:

Thank you to @smwindecker and @stevekambouris for the initial ideas and impromptu workshopping today!

njtierney commented 6 years ago

This looks great!

Some related ideas that might be useful:

liftr ("Persistent reproducible reporting by containerization of R Markdown documents.")
checkers

Looking forward to seeing this develop!

smwindecker commented 6 years ago

Great write-up @egouldo, and thanks for the links @njtierney! Can't wait to have a play with liftr this weekend. I wonder how extendable it is for an analysis that isn't conducted within an .Rmd.

cboettig commented 6 years ago

This sounds very cool! You might also want to take a look at https://github.com/o2r-project/containerit, another R package with somewhat similar aspirations.

Another related thing might be to create a Dockerfile + binder badge/link, allowing users to run the code on Binder without having to ever mess with installing Docker or anything else locally. (see binder badge in this example, https://github.com/cboettig/noise-phenomena, launches the Dockerfile in the repo on binder). Related efforts to this also include emerging platforms like https://wholetale.org/ (which also uses Rocker images) and https://codeocean.com/

Lingtax commented 6 years ago

I was just showing @egouldo and @stevekambouris noise phenomenon earlier today. Great example!

smwindecker commented 6 years ago

@nuest we are going to hack away at some of the open issues on containerit! head's up =)

our fork's at ropenscilabs/containerit

ropensci / ozunconf18

Dockerise your R project in one-click? time-capsules for R projects :package: #25

Problem

Proposed solution

Thanks: