Open gshotwell opened 8 years ago
@jennybc @benmarwick thoughts? or our discussion forum https://discuss.ropensci.org/
Yes, this looks very interesting. I'm not much of a make/makefile user, myself. I've seen @cboettig making good use of them though, he might have a more informed opinion here. @GShotwell do you have any examples of this package used 'in the wild', in a research compendium, etc.?
we may want to bring in @richfitz given https://github.com/richfitz/remake
When we get to automation in STAT 545 in a couple weeks time, I will invite them to try this out. I could also test it on some of the demo projects we show them, i.e. see how close it comes to the existing Makefiles.
Thanks @jennybc , that would be very helpful. I've tried the dependency detection on some of my own work, but since that's the work I had in mind when I wrote the package it's unsurprising that it does okay on those projects.
@benmarwick I don't have any examples of it being used in the wild (The package is only like 4 days old), but if anyone can recommend some good testing projects, I'd be grateful.
Why I posted here is that I'm trying to build the package around a model workflow for reproducible analysis, which I mostly cribbed from this repo. Right now easyMake assumes three big things about this workflow:
1) People will use explicit file names in their import and export statements. So they will write read.csv("data.csv")
and not name <- "data.csv"; read.csv(name
.
2) A given script will not have the same names for both its imports and exports. If a script loads "data.csv"
and edits it, it should save it as "data2.csv"' not
"data.csv"'. If you don't do this then you might end up with loops in an auto-created Makefile.
3) Scripts are pure in the sense that only communicate with the project through their imports and exports, so you don't run a script in order to store something in memory for a subsequent script to operate on. Basically you should be able to put rm(list = ls())
at the end of each script and not change your overall results.
All of the above means that running a script multiple times won't alter the results of the analysis. Do those seem like okay constraints?
They sound reasonable to me.
Here are some small example pipelines. You could see if easyMake recreates these Makefiles:
Not sure if this is the right place for this. But I put together a very simple R package which detects dependencies between R files, and then generates a Makefile. I think it could be a useful piece in getting less command line-savvy R users to start using Makefiles, but it's still a ways away from that goal. If you have any thoughts or advice please let me know.
https://github.com/GShotwell/easyMake/tree/master