Revisiting the research compendium: testing, automation, and review #5

Open noamross opened 7 years ago

noamross commented 7 years ago

In the 2015 unconf, one output was a document of best practices for research compendia, a self-contained repository for a research analysis. Many of the ideas in this, and similar work, derive from best practices from R packages. In the past few years, there have been advances and wider adoption of a number of R package development practices, notably in package testing, automation of testing/checking/building, and best practices for code review. R-based research compendia have not coalesced around a similar set of practices yet. I would aim to build tools that would help address this, with these questions.

(I have thoughts on most of these that I'll add below or in a linked document for these in a bit)

Possible outputs include

njtierney commented 7 years ago

I would be really interested in building tools to help facilitate the process of creating reproducible research compendia.

Recently I adopted the approaches described in rrrpkg, and it was great! (rough example here). I managed to have a bit more time on my hands, and was really disciplined with how I built the analysis. This was really great when the input data needed to change, or I needed to update one model. Very very satisfying to just type make paper, and then take a lunch break and your paper is all shiny and new when you get back.

However, recently I've been more pressed for time, and have started to do things in a way that "works for now", in an attempt to claw back some of the time that I need to get the analysis done. Then, a day, week, or a few months later, I'm suffering at my own hand because now I need to untangle the reproducibility issues that I've created.

I've heard it said that one of the benefits of doing things in a reproducible way is that it slows you down, and makes you think. This is a good point, but in my experience you don't always have the time to think, when deadlines are fast approaching.

Personally, my biggest problem with being reproducible is that the awesome frameworks can be fragile in the face of a scary deadline.

So, I really want to work on a "pit of success" for reproducibility, so that the default option is to fall into the right thing, rather than struggle doing it.

Two further thoughts:

noamross commented 7 years ago

One of the tensions I've found in this area is the difference between a "working" repo where one is developing ideas and doing exploratory work, and a fully formed workflow and analysis where you know what the outcomes should look like. I'd like to reduce the friction of moving from the former to the latter.

Pakillo commented 7 years ago


I think having a template (folder structure, etc) to use for all your projects helps a lot to work reproducibly. You always start from it and just have to follow the design; no more extra time once you have designed everything to your liking. E.g. my template is here. See also @benmarwick research compendium here.

We do all exploratory analyses as Rmarkdown documents in an analyses folder. Once we are clear what we want to include in the paper/report, we work in another Rmd document in a different folder (manuscript or report or whatever).

Of course large outputs are always a problem (see also We choose not to git track them, and share them through dropbox, figshare, or similar.

Hope this helps, and always interested to hear other ideas!

njtierney commented 7 years ago

@noamross I think you're right on about the tension between exploratory analysis and final report.

Looking back, that is why the first paper was so much easier for me. I already had all the exploratory analysis done, and so when I changed over to the other template, I just filled in a template, shoved all the functions into an \R folder and away I went. Mostly.

Having a template is definitely a great idea, @Pakillo, and like you say, it saves a lot of cognitive load, so you can spend more time thinking about the analysis. I think this is why the idea of using the r package workflow catches on nicely. People can rely on the workflow outlined by @hadley's devtools, and move through the steps.

This is a big topic! It's kind of hard to know where to start. But one thought is that R packages are generally a growing product - many (most?) will continue iterating over what their purpose is, and will improve over time. Whereas a paper or a report has a finite ending, and the iterative process of writing, analysing, and re-writing is different to package development. I wonder if perhaps it might be useful to compare and contrast package development to analysis development?

cboettig commented 7 years ago

Great thread, something I struggle with regularly as well.

My default is to stick with the R package structure for a new project, e.g. any exploration that's grown bigger than a a single self-contained .Rmd and for which I can think of a functional name for the repo. Like @njtierney says, this skeleton is quick to create with devtools or a template (I have, slightly dated).

I tend to omit formal tests; I refactor code a lot early on and having to refactor the unit tests is just a pain. Maybe that's not ideal but oh well. I do tend to write at least minimal roxygen docs with an example; these provide easier-to-maintain tests and can be helpful; and mean I can leave travis-ci turned on to do at least some sanity checking without being a burden. Functions I don't want to document / test I just don't export so CHECK doesn't complain.

Like @Pakillo , I do exploratory analysis in separate .Rmds (in a /inst/notebook dir for me), and then start writing the manuscript in /manuscript (if it's messy/long running) or /vignette (if it can run quickly).

Caching for large data &/or long-running outputs is a special problem. I think remake is a good general-purpose strategy here, but I tend to make sure my functions output some standard, text-based format which I store either in the package or on a separate GitHub repo (neither is ideal; but these are strictly intermediate objects which can be regenerated with enough cpu effort).

I think both papers and packages have a finite life cycles with similar challenges. Early on it's figuring out where to draw the box -- stuff starts in one repo and then gets split out into multiple repos for different packages / papers as the project grows and becomes better defined. Knowing when to split is always hard for either research or software; I which someone would give me some rules-of-thumb. Since we don't sweat software 'releases' I agree that there's no sharp delineation point in software the way there is in publishing, but after submitting a paper there's still the iteration of peer review, and after acceptance there's still some lifetime of the repo where I'm at least continuing to re-use and maybe tweak things on that project. And I think there comes a time for both software package or paper/compendium where one transitions from active maintenance to sunset & archive, though perhaps that happens more immediately after publication for most papers/compendium than for most software packages.

noamross commented 7 years ago

I'm thinking of of testing with a bit of a broader scope, including model diagnostics and data validation as done with assertr or validate. These tests may have boolean, text, or visual outputs, but I think its important to have a workflow that (a) separates them from the main products, and (b) ensures they are run and viewed with updates.

One paradigm I was considering was saving the environments of each of the notebook/ and manuscript/ knitr documents, and having scripts or Rmds in a test directory that load these environments and run standard tests on the objects in them. These could be run on CI and be available as artifacts, or pushed back to github, perhaps on a gh-pages branch.

I've had trouble with remake, because of the overhead of moving from script-based organization that characterizes many projects to the functional approach. make is easier to transition to, and I sometimes help long runtimes by keeping my knitr cache on CI. One potential unconf idea is working on some options to run scripts and save their environments as targets, which I think might help more people get aboard. We had an open issue discussing this but I think it disappeared in the renaming of "maker" to "remake".

On the lifecycle issue, I think something like rrtools::use_docker to put in local rocker infrastructure with a fixed R version, or rrtools::use_packrat/use_checkpoint, would be helpful to for "freezing" an analysis in place.

benmarwick commented 7 years ago

I'm following this discussion with great interest, especially the comments about testing and re/make, since I haven't found a natural place for those in my work, and I'm curious to see how others are using them.

I think there's scope for some easy wins with tooling to increase the convenience using CI for research-compendia-packages. There was a nice paper on using CI for research pipelines recently:

Related to CI, I like @noamross's idea of rrtools::use_docker. I once proposed something vaguely similar for devtools, and there are a few related projects mentioned in that thread that may be relevant here.

Seems like @stephlocke's excellent new could be a good prototype for rrtools

stephlocke commented 7 years ago

If y'all want to make use of pRojects I'm happy to do work to fold it into rOpenSci - I've made a lot of the Issues etc up for grabs and am keen to get other's opinions and requirements built in :)

noamross commented 7 years ago

One idea that we might be able make progress on here is figure out what subset of package-checks from various tools (R CMD check, lintr, goodpractice, etc.) can be easily applied to non-package projects, and possibly create some lightweight wrappers, extensions, and/or docs for using them on directories at various levels of rigor (e.g., something with full-blown build system and package-style DESCRIPTION, something with standalone scripts, data, and Rmds)

A possible home for this would be @stephlocke's

hadley commented 7 years ago

@noamross random aside: rigger could be a fun name for this package (sounds like both rigger and rigour)

noamross commented 7 years ago

@hadley +1, as in a former career, I actually had the title of chief rigger. ⛵️

njtierney commented 7 years ago

It seems to me that there should be some sort of golden path to tread for reproducibility, sort of like

@hadley Tostoy's (approximate) quote:

Tidy data are all alike; Untidy data is untidy in its own way

It seems that the same could mostly be said for reproducibility. There are very many ways that you can do reproducibility wrong, but my sense is that there should be a small set of ways to do it right.

That said, I think talking about what went wrong in the past could still be helpful, sort of like Etsy's "Guilt-Free Post Mortems" that I have heard about on NSSD.

So, do you think it might be useful to collect anti-patterns to reproducibility? I, ahem, may have a few horror stories to share. We could also, of course, share success stories, and things that went well. My thoughts are that this would help identify common problems, strong good patterns for reproducibility, and common anti patterns.


batpigandme commented 7 years ago

@njtierney I hadn't heard the guilt-free post-mortems, but I definitely think the repro research autopsies would fit well with/be of interest for #64.

benmarwick commented 7 years ago

There's a nice recent collection of success stories in, with a few chapters from R users (disclaimer: including me).

I'm not aware of any surveys of anti-patterns, that sounds like it would be very useful, and help to give some priority to the items in various how-to lists for reproducibile research (e.g.

If we could identify the 20% of patterns or behaviors that are responsible for 80% of irreproducibility in research (if such a thing is quantifiable), and target those with some tooling to change those behaviors, there could be a big win.

noamross commented 7 years ago

Let me try to summarize some of the stuff above as well as my own evolving thoughts on this. I realize that this project has several components, any one of which could be an unconf project on its own:

Bootstrapping compendia

The "project template" approach has been tackled repeatedly, and well, but beyond data/ and R/ (functions) directories and maybe DESCRIPTION, there's lots of heterogeneity to analysis project structure. Some projects consists of a series of R notebooks, some a series of scripts which may or may not be linked together. Output types vary a lot. There are several options for build systems, including Make and remake. For this reason, its hard to automatically bootstrap projects that are already underway with use_* functions - you can't overlay a build system unless you know the project logic.

A good start might be creating a set of project templates (make_project, remake_project, notebook_project?) that match a few common idioms. @gaborcsardi's mason may provide a good approach here, providing project setup with a small set of questions. With the choice of idiom set, functions to enhance a repo's reproducibility with use_docker or use_packrat/checkpoint and associated CI templates should be easier. Maybe information on the idiom can be stored in a dotfile or DESCRIPTION so that other bootstrapping and CI functions can use it.

Code analysis and testing

Training and review guides

stephlocke commented 7 years ago

I'm really interested in this and have started tackling the project setup and the testing / review of packages. I'd love to make these more community developed so it's not limited by time as I think the community at large can benefit.

ateucher commented 7 years ago

I really love this idea. In my work it is my mission to get people using tools to make their work more reproducible - it can be overwhelming to beginners to learn best practices and tools so having a starting point is great. I think we have to be careful to make this approachable enough so that it doesn't scare new users off (as @njtierney mentioned talking about the 'pit of success').

+1 to building on @stephlocke's pRojects package. I wonder too if there is something that can be done to help users create their own templates, perhaps built off some boilerplates? I've often heard people say that they would use ProjectTemplate etc but it's not quite the template they need. In government, we often have to do very similar projects repeatedly so custom templates would be very helpful.

As an aside, another build system I've recently heard about is drake - I've never tried it but heard an interview with the author on the r-podcast.

stephlocke commented 7 years ago

So I'm aiming to add a bunch of flags to the starter projects so folks can set exactly what they want in them. At the moment, each function has parameters and these can also be tuned in the add-in. Of course, someone has to know about add-ins to be able to get the benefit of the GUI which might be a catch-22.

I'm hoping to extend the list of options to include, for instance, Makefiles, My attempts at Makefiles have been dismal failures.

The package can also be wrapped so if someone want's a bespoke project function with internal standards, they can depend on pRojects, and use the createBasicProject as a starter like the Analysis and Training functions do

batpigandme commented 7 years ago

@ateucher “it can be overwhelming to beginners to learn best practices and tools so having a starting point is great” Yes, a thousand times yes-- and as someone who got overwhelmed by learning so many "best practices" in the beginning, this can lead to bad habits and/or just wasted time because the workflow you've got going satsfices.

@stephlocke It's interesting that you mention Makefiles because when I tweeted a "Reproducible workflows in R" post I'd found a few weeks ago which included make-related things, one of the responses was that there's a related conceptual obstacle of some sort common to R users. Orig tweet: Unfortunately, some quote-tweeting and whatnot has me unable to find the rest of the thread, but I'm wondering if there's an opportunity here to help fill in that conceptual gap too (which could even be through really good documentation).

ateucher commented 7 years ago

@batpigandme interesting that post is by Will Landau, as he is the author of drake. Maybe that was the start of his inspiration for it.

batpigandme commented 7 years ago

Oh, cool! I'll have to give drake a closer look. It was released pre-tweet, but I'll have to take a closer look at drake with the concepts he mentioned in mind.

stephlocke commented 7 years ago

My memory is hazy of trying to make the Makefile work, but I had a shell script that worked and I was trying to convert over with no luck whatsoever. And this was my trivial case!

noamross commented 7 years ago

A possible Makefile idiom might be this example of a set of Rmd notebook files from Lincoln Mullen:

Another possible Makefile idiom would be a set of scripts that should be run in-order. If the scripts have numbered filenames (01-clean-data.R, 02-fit-model.R) that might help for an easy template setup. I'm not sure how to make an easy setup for the output/intermediate files, though, as they'll vary so much project-to-project.

As for testing, I think I would separate an analysis-repo testing package from a package reviewing package, though the package reviewing task is something I'm interested in, too!

cboettig commented 7 years ago

This thread has been great. I'm particularly interested in the testing theme, though it seems the emphasis is primarily on the side of raising the bar above and beyond R CMD check. For R package-style compendia / shared research code, I'd be really interested in seeing some more light-weight tooling that can be easily deployed by folks & projects where satisfying all of R CMD check and goodpractice and the like is too high a bar.

One of our main motivations in promoting the R package structure for research compendia is the ability to take advantage of all the existing tooling (and I still stand by this). For instance, the package structure + devtools significantly lowers the barrier around adding continuous integration. I think it would be great to have some similar tooling that helped a user quickly and easily deploy continuous integration on a project that just had .Rmd files, or one that used the R package layout for dependencies etc, but perhaps didn't conform to the potentially more tedious parts of check (e.g. maybe skipping all those documentation related checks; I dunno but would love to hear input on what parts of check are more onerous than helpful in the compendium space). Thoughts?

noamross commented 7 years ago

Yes, @cboettig, I think our thoughts pretty much align here. I wasn't thinking of raising the bar above R CMD Check, but figuring out the subset of relevant checks across R CMD Check, lintr, goodpractice, etc., that would be relevant to compendia, especially beginner ones. I think you can turn on and off almost everything in R CMD Check individually via environment variables.

One thing that throws off using R package structure for research compendia is that most people always have scripts and .Rmds as top-level files or folders, and hate inst/whatever or vignettes because they're not the intuitive place to put the most important parts of your work. R/, data/ tend to be more consistently used. (Anecdotally). I think this should be accommodated, rather than forced.

The tooling aspect is what I tried to describe above in the obscurely names "Bootstrapping Compendia" section. Adding a build system + CI easily, (use_makefile(), use_travis()) is just what I'm getting at. The challenge is that you need to define your build logic in order to do so, and this will vary across projects. Some templates for common idioms (all Rmd files, scripts run in numbered order), can be tractable.

hadley commented 7 years ago

My feeling is that most of R CMD check is not relevant for research compendia. I think it's better to start from scratch and build up a useful set of checks, rather than starting from an existing series of checks and turning them off. If the structure of the compendia is designed correctly, you'll still be able to run R CMD check if you want to.

gaborcsardi commented 7 years ago

@hadley FWIW IIRC you can turn on goodpractice checks one by one. (Most of them are coming from R CMD check, though.)

hadley commented 7 years ago

@gaborcsardi I was more meaning that it would be better to start from scratch and carefully consider what tests would be maximally useful (rather than starting from a somewhat related list and deciding what's not useful)

noamross commented 7 years ago

Useful bootstrapping idiom: single Rmd with Makefile + CI, though in this case the Makefile may be unnecessary:

noamross commented 7 years ago

The above made me think of a possible project-level idiom/template:

Maybe project setup options or some variables at the top of a Makefile let the user set directory names, allow for top-level files, or opt out of tests. Also, we could only scripts that start with numbers (e.g. 01-clean-data.R), as testthat does, to allow for throw-away or in-development scripts. It's a coarse way of doing dependency management but would allow for a simple project template including CI.

stephlocke commented 7 years ago

I think getting some good Makefiles is key - at the mo I use a combo of R and bash scripts in my continuous docs build process

I still like the R manifest / package structure and I think it's good practice for any R functions used in the analysis to be tested so I'm happy for them to go into an R directory and get unit tested etc

batpigandme commented 7 years ago

Depending on the target-user-level, it might be worthwhile to consider how to integrate with Makefiles 101 &or how to guide the user through the automation. [Read: I just learned a bunch of stuff from that link Noam posted, which made me realize how confused I would have been about this had I not read that.]

Another really helpful resource (at least at my level of competence) has been @jennybc's "All the automation things" stash.

stephlocke commented 7 years ago

My preference is to automate away a load of the complexity by having it just happen out of the box for people, with tuning knobs for people who know what they're doing. I'd like to see automated standard document generation happen like magic.

stephlocke commented 7 years ago

So for instance, out of the box with pRojects I setup travis & packrat

Then we're building on the basic foundations to create project types like a training project, where you can select different packages and we'll also make sure you have those packages available (inc. in DESCRIPTION) and where possible, we're adding demo content.

The next step is this automation discussion - how do we robustly incorporate Makefiles or something else to be included in the Travis-CI process? What is the right approach?

noamross commented 7 years ago

A note on CI and Travis. I think we should include, possibly by default, a CI service that provides a usable free private option and document this well. CircleCI is a much better option here than Travis. (1 concurrent instance v. 100 total builds). I think encouraging people to do reproducible work is easier when they are shown they don't have to do it all in public, especially at early stages.

stephlocke commented 7 years ago

Aye - being able to build private repos is something that'd be neat to do.

batpigandme commented 7 years ago

@stephlocke Hey, I never say no to a magic box that gets things right every time!

hadley commented 7 years ago

@noamross the disadvantage of CircleCI is that it's much less battle tested than travis, so you have to weigh up the potential downsides of newcomers hitting more problems.

noamross commented 7 years ago

@hadley Good point 🤔. On the plus side, (besides private builds) CircleCI is mostly Docker-based and is a natural fit rocker images (or project Dockerfiles). Also Travis's package testing is battle tested, but a good bit of the analysis testing might end up being quite different in any case. Not sure where I land on this, but would love a solution that works for a student using free GitHub private accounts.

cboettig commented 7 years ago

+1 for supporting other CI platforms as well. Circle-CI also allows ssh into builds, lets users specify private keys via the web interface (no convoluted & sometimes vulnerable encrypt and put in .travis.yml), and has decent realtime support people. Getting CI to work in the less structured context of research scripts is gonna be tricky regardless.

Personally I think we want to avoid getting into super complex project architectures though -- I still think the R package model is the option for such needs, where common tasks get broken into functions in R/, keeping .Rmd scripts concise and focused on communication. Complex combinations of lots of scripts always make me nervous (and I have yet to understand what can be done in an a makefile that it isn't easier to teach an R user to do in R -- but I may be all alone on that one).

I think a good first pass would be a simple way to get a single .Rmd script running on CI, then maybe add some basic appropriate testing (a la #38) so we get beyond "did it run without errors." I'd also like to see a clear & natural path for how such a project can transition into a more fully-fledged R package if/as the analysis grows.

Just figuring out the mechanism to install dependencies on the CI system in this context will be a good challenge, some options I see are

A. Packrat B. user writes a DESCRIPTION file, or C. we make a utility which parses all scripts for for library/require calls (& instances of pkg::)? D. .travis template starts with a pretty kitchen-sink install; e.g. something like rocker/ropensci image or possibly a complete CRAN environment,

or maybe some combination of these.

noamross commented 7 years ago

As for (C), both packrat and checkpoint have functions for parsing the scripts to get these. I think using these to populate a DESCRIPTION or a Dockerfile with dependencies is good if one is not using Packrat. But I don't think having the user populate these themselves is that bad.

Personally I think if we don't make something that's capable of handling multiple scripts then it won't be useful for many people. Again, this is personal perspective, but nearly every project I'm involved in ends up involving multiple scripts and .Rmd files. I tend to think functionalizing everything requires much higher activation energy than something that will just run them in order.

karthik commented 7 years ago

@cboettig @noamross

C. we make a utility which parses all scripts for for library/require calls (& instances of pkg::)? D. .travis template starts with a pretty kitchen-sink install; e.g. something like rocker/ropensci image

This is an idea I've been noodling around for a long time and stated as a package last month. I'll start a new issue rather than clutter this thread.

noamross commented 7 years ago

OK, let me try to re-summarize. I think at the unconf we will try to accomplish prototyping at least one of these three things:

stephlocke commented 7 years ago

I'm happy to donate my in-dev projects to this. At the mo, they can undergo very breaking changes because of their prototype nature so if naming conventions etc have to shift that isn't a problem.

benmarwick commented 7 years ago

When considering standards, it might be useful to take a look at what's already out there in the wild. Here's a table of some of the public package-as-compendia examples I know of (taken from an extended version of the rrrpkg essay that's in the works). It's not encyclopedic, but it shows that the use of a non-standard directory is only slightly more popular that using the vignettes directory for the main Rmd document:

location of main document n compendia n unique first authors n templates
non-standard directory 5 3 2
vignette directory 4 4 2

The upshot of the using the vignettes directory is that the pkg can be installed using install.packages and just work. The downside is that it buries the most important content (the main Rmd document) in a place where a novice user may struggle to find it when browsing the file structure.

My preference is for the non-standard directory because I think browse-ability should take priority over install-ability (I don't think this these compendia pkgs should be on CRAN, but instead should be archived with a DOI on figshare or zenodo, etc.). But this is an unresolved question, and I can imagine situations where install-ability would be preferable. So it may be tricky to settle on a standard or convention here.

Here are the templates referred to in the table, as far as I know each have been used for at least one publicly accessible scholarly publication:

I see that @stephlocke's pRojects also favours a non-standard directory for the analysis Rmd (via the pRojects::createAnalysisProject function)

noamross commented 7 years ago

I agree with all that @benmarwick! We can have default directories for part of the project but give the option to designate directory names/paths. For Rmds, though, one might just be able to take the simple tack of knitting all the Rmds in the directory structure. This assumes that knitting order does not matter. (My projects tend to have scripts for which order matters, and Rmds which depend on the scripts' outputs but not each other, but that's just me.)

hadley commented 7 years ago

@noamross I think generally you want to lean towards convention over configuration: you want an opinionated tool for making compendia, not a flexible tool that allows each person to design their own workflow.

@benmarwick if the main problem with vignettes/ is discoverability, it seems easier to fix the discoverability problem rather than use a different directory. One advantage of using vignettes/ is that you can basically get a project website for "free" with pkgdown.

Generally, I'd say if there's an package directory that is a reasonable close (if not perfect match) to a compendia directory, you're better off sticking with the package convention because it gives you many tools (install.packages(), devtools, pkgdown, ...) for no additional cost.

cboettig commented 7 years ago

@hadley "convention over configuration" ❤️.

I think the combination of a concise README telling people what's where and then building around existing tools is probably the way to go.

Personally what I most want out of a light-weight alternative to the standard Package compendium is an easy convention my students can follow to (a) keep a tidy-looking repository of .Rmd scripts for a related project / series of assignments, and (b) easy way to enable travis builds of said .Rmd files (i.e. locate the .Rmds, install required libraries, tell me if any of them throw an error on rmarkdown::render). Other stuff could be added to this, but I think this would be a great start.

I also really like @stephlocke 's project template packages and think it would be awesome to build around those.

noamross commented 7 years ago

Speaking of students, @cboettig, maybe you have some grading rubrics that could be input for thinking about the review checklist?

cboettig commented 7 years ago

ha, that's yet another topic in which I could greatly use the wisdom of @jennybc. My students start from such different backgrounds that my grading tries to reflect the amount of effort &/or learning much more than what is objectively accomplished...