ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

drake (R package) #156

Closed wlandau-lilly closed 6 years ago

wlandau-lilly commented 6 years ago

Summary

The drake package is an R-focused pipeline toolkit. It reproducibly brings results up to date and automatically arranges computations into successive parallelizable stages. It has a Tidyverse-friendly front-end, powerful interactive visuals, and a vast arsenal of multicore and distributed computing backends.

Package: drake
Title: Data Frames in R for Make
Version: 4.4.1.9000
Authors@R: c(
  person(
    family = "Landau",
    given = c("William", "Michael"),
    email = "will.landau@lilly.com",
    role = c("aut", "cre")),
  person(
    family = "Axthelm",
    given = "Alex",
    email = "aaxthelm@che.IN.gov",
    role = "ctb"),
  person(
    family = "Clarkberg",
    given = "Jasper",
    email = "jasper@clarkberg.org",
    role = "ctb"),
  person(
    family = "Eli Lilly and Company",
    role = "cph"))
Description: A solution for reproducible code and 
  high-performance computing.
License: GPL-3
Depends:
  R (>= 3.2.0)
Imports:
  codetools,
  crayon,
  eply,
  evaluate,
  digest,
  formatR,
  future,
  grDevices,
  igraph,
  knitr,
  lubridate,
  magrittr,
  parallel,
  plyr,
  R.utils,
  rprojroot,
  stats,
  storr (>= 1.1.0),
  stringi,
  stringr,
  testthat,
  utils,
  visNetwork,
  withr
Suggests: 
  abind,
  DBI,
  future.batchtools,
  MASS,
  methods,
  RSQLite,
  rmarkdown,
  tibble
VignetteBuilder: knitr
URL: https://github.com/wlandau-lilly/drake
BugReports: https://github.com/wlandau-lilly/drake/issues
RoxygenNote: 6.0.1

Similar work

Remake

Drake overlaps with its direct predecessor, remake. In fact, drake owes its core ideas to remake and @richfitz, and explicit acknowledgements are in the documentation. However, drake surpasses remake in several important ways, including but not limited to the following.

  1. High-performance computing. Remake has no native parallel computing support. Drake, on the other hand, has a vast arsenal of parallel computing options, from local multicore computing to serious distributed computing. Thanks to future, future.batchtools, and batchtools, it is straightforward to configure a drake project for most popular job schedulers, such as SLURM, TORQUE, and the Sun/Univa Grid Engine, as well as systems contained in Docker images.
  2. A friendly interface. In remake, the user must manually write a YAML configuration file to arrange the steps of a workflow. In drake, this configuration is based on data frames that built-in wildcard templating functionality easily generates at scale.
  3. Thorough documentation. Drake contains eight vignettes, a comprehensive README, examples in the help files of user-side functions, and accessible example code that users can write with drake::example_drake().
  4. Active maintenance. Drake is actively developed and maintained, and issues are usually solved promptly.

Factual's drake

Factual's drake is similar in concept, but the development effort is completely unrelated to the R package of the same name.

Other pipeline toolkits

There are many other successful pipeline toolkits, and the drake package distinguishes itself with its R-focused approach, Tidyverse-friendly interface, and parallel computing flexibility.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

I plan to submit to JOSS in the future, but the manuscript is not currently ready.

Detail

maelle commented 6 years ago

@jules32 recruited a technical reviewer! :tada: Thanks a ton! Welcome, @gothub, and thanks a lot! Is 2018/01/22 a good deadline for your review?

Here is our review template and our reviewing guide

wlandau-lilly commented 6 years ago

Yes, thank you @gothub!

By the way: @benmarwick, do you think I sufficiently addressed your review? I made several improvements since my first response on December 9.

gothub commented 6 years ago

@maelle yes, I'll have my review in by 2018/01/22.

maelle commented 6 years ago

Fantastic, thanks for being so flexible!

gothub commented 6 years ago

@maelle here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 5


Review Comments

The documentation makes several statements that this package supports "reproducibility", and even mentions that the reproducibility supported is different than scientific replicability. The meaning of this term as it relates to the package functionality is not clear to me. In my experience at NCEAS this term generally means that source data, processing artifacts and documentation of a workflow are persisted and available so that they can be obtained by a researcher who could replicate any processing products on their own, without any guidance from the original researcher. Does 'drake' somehow assist in this type of reproducibility?

An alternative use of reproducibility might be for an individual researcher (or team) to always expect consistent results even when any of their source data or processing steps have been modified. Is this the claim that the documentation is making? If a processing workflow is configured in drake then consistent results will always be generated? If that is the case, then how is this more reliable than running all scripts manually, even though this may be inefficient? I'm not trying to play devil's advocate here, just trying to clarify and understand the statements made in the documentation. In working with environmental scientists and ecologists for several years, I have found that the definition of reproducibility is fairly specific, so elaborating on the use of this term for the package may bring more clarity.

gothub commented 6 years ago

@wlandau-lilly the example descriptions in drake.Rmd and README look great.

wlandau-lilly commented 6 years ago

Thanks, Peter! I hope to have a proper response to your review soon.

wlandau-lilly commented 6 years ago

Response to @gothub's review

I sincerely appreciate your timely and thorough review! I found your feedback extremely helpful, and I have responded to specific comments below.

There were quite a few 'suggested' packages

Yes, most of these optional packages support the vignettes, examples, custom caches, and advanced high-performance computing options.

The link in the 'drake' vignette to the 'quickstart' points to the documentation website, although the equivalent 'quickstart' vignette is included with the R package.

I have tweaked the beginning of the drake.Rmd vignette so it links to both the quickstart vignette source and the rendered online version.

The 'basic example' (i.e. from 'load_basic_example') is used throughout the documentation, so the basic setup of a 'template' workflow is not described fully in the vignettes or R command documentation.

I know it is buried very deep, but the example from load_basic_example() is actually paired with its own saved workflow. The code files are located here, referenced here, and generated by drake_example("basic"). The R script sets up the code and carries out a heavily-commented tutorial that walks through some of drake's main features. I do agree that I should make all this more available, so I have added extra references to the vignettes and examples. Now, wherever load_basic_example() is called, there is a comment that says # Get the code with drake_example("basic").

there are also other examples available that are more detailed and these can be listed with 'drake_examples()'. It would be very useful to provide a list of these examples in this vignette with a very brief description of the examples. I found these additional examples very helpful (to be more helpful than then the basic example) as they used actual data with a processing goal explained.

There is a listing of examples here, but again, it is buried and far from obvious. I have replicated this list in a new section of the drake.Rmd vignette, and the documentation section of the README now describes the beginner-oriented examples.

The function documentation is complete and detailed with examples. Some of the examples are not run (by roxygen). Does devtools::check() evaluate these examples?

I chose to suppress many of the examples because some are time-consuming and others require the optional packages. And yes, devtools::check() runs the examples, and it skips the code enclosed in \dontrun{}. You can run all the code, even the code inside \dontrun{}, with the run argument of devtools::run_examples(). I routinely check devtools::run_examples(run = TRUE) and devtools::run_examples(run = FALSE).

The parallel execution functionality appears to be powerful and is clearly explained but I have not run tests to confirmed processing time gains when using the parallel backends vs serial processing. If the parallel execution functionality is important to rOpenSci for inclusion of this package, then it may be useful to include a long running example for use as a benchmark with execution times shown for each backend and for serial processing. I'm not suggesting that this is necessary, but this could be suggested by the editor if deemed a requirement.

You bring up an important point, and I do plan to do serious performance studies in the future. @maelle, do you think we need a long-running example for rOpenSci?

The documentation makes several statements that this package supports "reproducibility", and even mentions that the reproducibility supported is different than scientific replicability. The meaning of this term as it relates to the package functionality is not clear to me. In my experience at NCEAS this term generally means that source data, processing artifacts and documentation of a workflow are persisted and available so that they can be obtained by a researcher who could replicate any processing products on their own, without any guidance from the original researcher. Does 'drake' somehow assist in this type of reproducibility?

I am so glad you brought this up. Yes, the data need to be available and the documentation is important. But without a tool like drake, we don't know if a project is actually reproducible until we try to recompute the results from the beginning. That usually takes a ton of time, effort, and digging. On the other hand, if outdated() says that all the targets are up to date, we get tangible evidence for free. We know someone else could reproduce the results from the inputs because drake says they are synced. We know that the code and data have not changed meaningfully since the last time the results were generated. This quick automated check is extremely valuable when the alternative is so painstaking and difficult.

Other reproducibility advantages:

An alternative use of reproducibility might be for an individual researcher (or team) to always expect consistent results even when any of their source data or processing steps have been modified. Is this the claim that the documentation is making?

Not exactly. Drake is about detecting when the data/processing has been modified, when it has not, and what to do in each eventuality.

If a processing workflow is configured in drake then consistent results will always be generated? If that is the case, then how is this more reliable than running all scripts manually, even though this may be inefficient?

For large projects, running all the scripts manually is difficult enough for drake to make a huge practical difference. I strongly believe that checking internal consistency early and often is super important for reproducibility.

You brought up another kind of consistency that I find interesting. If we repeat the analysis from scratch a large number of times, will we always get the same final answer? For stochastic methods like MCMC, this is another important consideration for reproducibility. However, it is outside the scope of drake. Drake does make some effort to preserve random seeds (see wlandau-lilly/drake#56 and wlandau-lilly/drake#109) but that hardly scratches the surface of the issue.

HenrikBengtsson commented 6 years ago

My $0.02 comment on:

"... include a long running example for use as a benchmark with execution times shown for each backend and for serial processing ..."

It'll be impossible to find a one-size-fits-all use case that will be representative and that can be used to produce benchmark statistics that can be used as a guidance for a random user. Each user/project will have its own use case/pipeline to benchmark - I think that's what should be benchmarked. And after the first round of parallelization comes tweaking/optimization of how to best structure the parallelization of your analysis - and there are often multiple ways. Factors such as amount of RAM, size of files processed, other tenants running on the same system at the same time, and so on, also come in play.

So, I would be careful making benchmark claims for large workflows, especially since such "results" tend to stick in the community as the ultimate facts/truth of a particular tool. It's easier to benchmark a very specific problems, but Drake is in the business of being a generic workflow manager with automatic parallelization as a feature.

wlandau-lilly commented 6 years ago

Your comments are gratifying, Henrik. I still plan to study performance and optimization at some point, but for right now, you are making my job seem less daunting.

When it comes to parallel computing, I claim that drake processes and caches targets in parallel (in stages) and in the correct order. I also claim that the work deploys to the user's parallel backend of choice. With the function in_progress() (or equivalently, progress() or vis_drake_graph()) and your system's job monitoring software, I think this should all be straightforward to verify for one of the built-in examples if you think there is still a need. The basic example is probably easiest because of load_basic_example(). You may need to artificially slow down the workflow, maybe by appending "; sleep(60)" to all the commands or making the input datasets large enough.

Drake's contribution is figuring out the parallel arrangement of targets automatically. For example, a machine learning expert need not worry that the neural nets depend on the training data. Drake knows that the data must be processed first, and it knows which neural nets can be computed at the same time. Beyond that, it borrows its power from other tools. The storr, parallel, future, future.apply, future.batchtools, and batchtools packages do the heavy lifting.

Late last year, @kendonB heavily used drake to deploy a massive project to his organization's SLURM cluster (via make(..., parallelism = "future_lapply")). He uncovered a huge batch of performance issues, and drake is a much better package because of his on-target feedback. I fully expect more bottlenecks to arise as more power users use the package in earnest.

wlandau-lilly commented 6 years ago

@gothub, I realized that I did not adequately address your point about the lack of a template workflow. It is difficult to have a template when drake focuses on a project's R session rather than its file system. You need a workflow plan data frame with targets and commands, plus any supporting functions and files, but otherwise, there is a lot of flexibility. The built-in examples are probably the best reference.

wlandau-lilly commented 6 years ago

Also, I rephrased my justification of reproducibility. I struggle to explain it because it is a nontraditional interpretation of the idea, at least in the statistics and broader scientific communities.

Yes, the data need to be available and the documentation is important. But without a tool like drake, we don't know if a project is actually reproducible until we try to recompute the results from the beginning. That usually takes a ton of time, effort, and digging. On the other hand, if outdated() says that all the targets are up to date, we get tangible evidence for free. We know someone else could reproduce the results from the inputs because drake says they are synced. We know that the code and data have not changed meaningfully since the last time the results were generated. This quick automated check is extremely valuable when the alternative is so painstaking and difficult.

maelle commented 6 years ago

Thanks a lot @gothub!

Rich discussion here! @wlandau-lilly please make sure to insert all clarifications from this thread into the documentation 😉

When you're done update this thread with a summary of answers to reviewers (a short one per review) in order to make it easier for them to sign off your changes even if they have little time. ☺ good job on using their comments until now!

wlandau-lilly commented 6 years ago

@maelle, in the above two commits, I have inserted the clarifications about reproducibility and high-performance computing in the documentation. From re-reading the thread, I think these are the major clarifications. Are you thinking of any more?

wlandau-lilly commented 6 years ago

Also, I realized that I totally missed the point of one of your earlier questions.

Very naive question, does each command need to be something as basic as summary or could it be sourcing a larger script containing several regression calls?

Commands can be arbitrary code chunks. For example:

my_plan <- drake_plan(
  a = 1 + 1,
  b = {
    x <- pi + a
    y <- sqrt(x)
    rand <- rnorm(10, sd = y)
    mean(rand)
  },
  c = a - 5,
  d = c(b, c)
)

However, a command should NOT just source('large_command.R'). Drake is function-oriented, not file-oriented. I am beginning to see widespread confusion here (for example, wlandau-lilly/drake#193), and I am doing my best to fix it. I have a new best practices guide that spells out the pitfalls, and I plan to add more over the course of future development.

wlandau-lilly commented 6 years ago

Good point about summarizing my responses to reviewers. This is a massive thread.

wlandau-lilly commented 6 years ago

Summary of @wlandau-lilly's responses to reviewers

Drake is large and difficult to review, and I am grateful for all your hard work. Drake is a much better package because of your feedback.

@benmarwick

Changes to drake

Comments

@jules32

Changes to drake

Comments

@gothub

Changes to drake

Comments

maelle commented 6 years ago

@jules32 @benmarwick @gothub are you happy with the changes made?

Thanks @wlandau-lilly for the summary, and great on having used all feedback! 😸

benmarwick commented 6 years ago

Yes, thanks, I'm happy. You've done lots of work to address the concerns in my review, that's excellent to see. The package is much more accessible now, and it's easier to see how to get started using it.

wlandau-lilly commented 6 years ago

Thank you, @benmarwick! I am so glad you think the changes are making a difference.

By the way, I just updated the review summaries in light of wlandau-lilly/drake#195.

gothub commented 6 years ago

All of the points I mentioned have been fully addressed with clear and complete documentation, and I have no other issues, so thumbs up from me.

I have one remaining question that is not really an issue for the review, but I'm just curious about. If i were to develop a workflow with drake, what drake specific artifacts would need to be preserved for a researcher to reproduce my analysis (other than the R scripts and data that may have existed before even starting to build the drake workflow)? I'm assuming that it's simply a CSV from the workflow itself, e.g. 'my_plan.csv' from the basic example. If more than that is required then explaining that in the docs or having an export function would be a good idea.

jules32 commented 6 years ago

Yes, I'm happy too. You've addressed the ideas I had and think that this is good to continue moving forward. Nice work!

wlandau-lilly commented 6 years ago

Thank you so much, @gothub and @jules32! It is exciting to have confirmation from all three reviewers!

@gothub, I really like your question. I will try to break down the answer.

R scripts

As you say, we need any code that writes functions, generates workflow plans, loads packages, and calls make().

The workflow plan

In the way I use drake (and the way I picture others using it), the workflow plan does not actually need to be saved as a csv file. Typically, it is super quick to programmatically generate in an R script before make() is called. I usually keep a plan.R file that uses drake_plan(), plan_analyses(), plan_summaries(), expand_plan(), and evaluate_plan() to generate the plan. The advantage here is that you can typically spin up thousands of targets with as few as 10-20 lines of human-readable code. (With remake or Make, you would have to write it all out by hand, which is often intractable.) Then, I source("plan.R") before calling make().

Input files and other imported objects

Examples include optional report.Rmd files, other knitr reports, user-defined CSV data files, and functions created in the R scripts. It's always a good idea to check for missing input files and imported functions/objects. You could either use the missed() function, or you could use vis_drake_graph() and look for the purple nodes, which indicate missing objects or files.

R packages

Packages are tricker. Maintaining a reproducible package environment is beyond the scope of drake, and the documentation recommends packrat for this purpose (mentioned in the quickstart and caution vignettes). But if you do not use packrat, there is still hope. By default, each make() saves the sessionInfo() to the cache (use the session_info argument to deactivate). An independent researcher just needs to call drake_session() to learn about the required versions of R, packages, and other system requirements.

The cache

Including the drake cache (Usually a hidden .drake/ folder) lets independent researchers use drake_session() to learn about the package environment, and it allows them to call make() to check that everything is up to date (and thus re-creatable). This can be tricky: by design, the drake cache (which is just a storr::storr_rds()) divides its data over lots of tiny files, which is essential to avoid race conditions in parallel computing. However, this setup behaves poorly under version control, and it may even make it a bit annoying to use a shared file system like Dropbox or Google Drive.

For the version control bit, @noamross recently suggested that drake supply a flat text log file to give a list of targets and their fingerprints. That way, the git commit history of the log tracks changes to the targets over time. Right now, drake supports make(..., cache_log_file = "my_log_file.txt"), make(..., cache_log_file = TRUE), and drake_cache_log_file().

wlandau-lilly commented 6 years ago

@benmarwick, @jules32, @gothub, and @maelle, would any of you like to be listed in the DESCRIPTION as reviewers? (person(..., role = "rev")).

maelle commented 6 years ago

Editors shouldn't be listed :-) but thanks for asking!

gothub commented 6 years ago

@wlandau-lilly Thanks for the explanation of drake persistence.

Yes, please include me as a listed reviewer of the package, and thanks!

wlandau-lilly commented 6 years ago

Done, thanks.

wlandau-lilly commented 6 years ago

By the way, an Appveyor build failed just now because listenv is in the middle of an update on CRAN.

  There are binary versions available (and will be installed) but the
  source versions are later:
        binary source
listenv  0.6.0  0.7.0
tibble   1.4.1  1.4.2
Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/listenv_0.6.0.zip

In my experience, this kind of error just blows over if I wait a bit. Would that be okay? I ask because my sense is that drake is about to be approved, so the timing is not ideal.

maelle commented 6 years ago

No problem, I'll test it on Windows later this week when I do the last checks!

maelle commented 6 years ago

Approved! Thanks everyone for a productive and (I hope) enjoyable review process!

To-dos

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). Let me know if you are interested, I'd ping rOpenSci community manager.

maelle commented 6 years ago

I forgot to mention R CMD Check passed without any issue on my laptop on Windows. You might just need to justify having "rev" roles in cran-comments.md (but then it gets accepted because that's a valid use of the role).

wlandau-lilly commented 6 years ago

Wow, @maelle! Thank you so much! This has been an extremely rewarding and gratifying review process, and I am super grateful for your hard work!

One note before I begin the transition: now that drake is moving to rOpenSci, I want to take this opportunity to shift my open source work to my personal account, @wlandau. My colleagues and I agree this is for the best. Would you invite @wlandau to rOpenSci?

maelle commented 6 years ago

Done!

wlandau commented 6 years ago

Wonderful! I transferred the repo, which was super exciting. One small thing: I do not have access to the "Settings" tab, and I cannot change the link to the documentation website at the top. Am I just over-eager, or is there another step?

capture

wlandau commented 6 years ago

I think I found the problem, and it is probably a common issue. @maelle, would you add me as a maintainer at https://github.com/orgs/ropensci/teams/drake/members?

maintainer

maelle commented 6 years ago

Done! 😸

wlandau commented 6 years ago

Fantastic! Now, all the links reference rOpenSci rather than @wlandau-lilly. Before I activate Zenodo and submit to JOSS, I will make my final preparations for the release of 5.0.0 (soon to go to CRAN).

I have prepared a narrative-form blog post, and I am excited to share it. Is there somewhere I can submit a PR?

Also, could I append an rOpenSci footer to the README?

maelle commented 6 years ago

Pinging @stefaniebutland

wlandau commented 6 years ago

Also, before I create a release for Zenodo, I think I should confirm with @jules32 and @benmarwick about being listed as reviewers in the package itself.

jules32 commented 6 years ago

Hi All,

This is exciting to see happen! I am fine not being listed as a reviewer; I hope that my suggestions were helpful but, as they were minimal, I don't think I need to be included in the DESCRIPTION.

But thank you for asking!

On Thu, Jan 25, 2018 at 12:18 PM, Will Landau notifications@github.com wrote:

Also, before I create a release for Zenodo, I think I should confirm with @jules32 https://github.com/jules32 and @benmarwick https://github.com/benmarwick about being listed as reviewers https://github.com/ropensci/drake/blob/649780c2cbb0a7f1241b48926edf588d7c78bc05/DESCRIPTION#L28 in the package itself.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/156#issuecomment-360587983, or mute the thread https://github.com/notifications/unsubscribe-auth/AFnnRROmPRbejeJQ2_Oh_vVDRJpGJHCcks5tOOGwgaJpZM4QYbv- .

--

Julia Stewart Lowndes, PhD Ocean Health Index National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara (UCSB) website http://jules32.github.io/ • ohi-science http://ohi-science.org/ • github https://github.com/jules32 • twitter https://twitter.com/juliesquid

wlandau commented 6 years ago

Thanks for letting me know, Julie. Your suggestions absolutely helped, and I am glad you were part of this. Please let me know if you change your mind.

maelle commented 6 years ago

@jules32 as an editor I found your review very good and will most certainly try and recruit you again!

stefaniebutland commented 6 years ago

Hello @wlandau-lilly. I'm rOpenSci's Community Manager.

I have prepared a narrative-form blog post, and I am excited to share it. Is there somewhere I can submit a PR? We'd love to have a post about this. Please submit a PR according to: https://github.com/ropensci/roweb2#contributing-a-blog-post. I will review it, provide some feedback and a suggested date to publish it.

Thank you!

wlandau commented 6 years ago

Thanks, @stefaniebutland! I just submitted ropensci/roweb2#140. Really looking forward to the next steps.

wlandau commented 6 years ago

@maelle, I have added @benmarwick as a reviewer in the DESCRIPTION (permission granted via email), and I have completed the 5 to-dos from here. I also submitted drake 5.0.0 to CRAN, added an rOpenSci footer to the README, and volunteered as a reviewer. This is all super exciting!

Besides the separate processes for the blog entry and the JOSS paper, there are a couple minor loose ends that will probably resolve themselves.

  1. AppVeyor CI is still deactivated.
  2. The rOpenSci badge still says "Under Review" in the GitHub README, but the same badge says "Peer Reviewed" on the documentation website.
maelle commented 6 years ago
  1. Now activated, here is the badge code [![Build status](https://ci.appveyor.com/api/projects/status/4ypc9xnmqt70j94e?svg=true)](https://ci.appveyor.com/project/ropensci/drake)
  2. Yes they take a while to propagate, but tell us if it's still the case on Monday for instance.
wlandau commented 6 years ago

Thanks, @maelle. The AppVeyor builds appear to be working. I will check back on the badge in a few days.

wlandau commented 6 years ago

Another minor thing: I am getting emails like this one on every commit:

Build ropensci/drake@810b97d5ba successful: https://ropensci.ocpu.io/drake/

NEW COMMITS: [810b97d5ba] Time: 2018-01-26 08:39:16 ...

Is there a way to attenuate them? Are the other watchers receiving these emails, or just the maintainers? I do not see an OpenCPU webhook for drake, so I am guessing it is at the org level.

jeroen commented 6 years ago

Those emails are from the CI system, they are sent to the pusher only. You can ignore/filter them.

wlandau commented 6 years ago

Good enough for me, thanks @jeroen.