make using roxygen-like documentation for analysis directories

AliciaSchep commented 7 years ago

I've been thinking about whether it would be possible & useful to have roxygen-like tags for documenting input and outputs of analysis scripts that could be used for easily creating a makefile when needed. This idea is very related to first part of thread #5, particularly the second comment (from @njtierney) about the struggle to go from exploratory analysis to something reproducible and subsequent discussion of make, but as that thread has moved on a bit into testing/CI/pkg issues I figured I'd started a new thread.

The idea would be that in a given R script (or Rmarkdown) you might at some point read in inputs and at other points write outputs. You could tag inputs and outputs:

#' myfile.csv
#' A really cool data file!
#' @source coolwebsite.com
#' @input myinfile.csv
mytable <- read_csv("myfile.csv")

myoutput <- do_stuff(mytable)

#' myoutput.rds
#' My awesome calculated result
#' @output myoutput.rds
saveRDS(myoutput)

Then another script might have:

#' @input myoutput.rds
myinput <- readRDS("myoutput.rds")

Within the directory containing all these scripts, you could run a command that reads through all the scripts and their input and output files and creates a makefile. If there are any circular dependencies those would get flagged. The command would also create man pages for each input and output object, as well as an overall workflow documentation with a dependency graph linking to individual input/output documentation.

There already is an R package to automatically make makefiles from R scripts -- easyMake. It tries to automatically detect when a file reads in an input or exports a file. I think roxygen-like tags might be a bit more flexible and transparent, as you would be able to specify each input and output file without having to rely on all the input and output functions used being recognized. This roxygen-like system would also enable creation of a better documentation of the workflow and inputs/outputs than just the makefile or a dependency graph of filenames.

Perhaps rather than creating a new roxygen-like system, roxygen itself could also be adapted for this purpose?

stephlocke commented 7 years ago

I like the idea of enhanced metadata & documentation for my work

bzkrouse commented 7 years ago

Nice idea, I'm also interested in giving more attention to the struggle of organizing and keeping track of exploratory analysis. The concept of collecting metadata on analysis was also discussed in #23 - although also with emphasis on collecting information about results.

MilesMcBain commented 7 years ago

I only just noticed this issue in the midst of cleaning up mine. I think what you're describing here is a REALLY great idea. How about a name: makedown? 😉