Closed wlandau-lilly closed 6 years ago
Thanks for your submission @wlandau-lilly ! Editors are discussing now
Thanks, @sckott.
Edit: I think @richfitz would be an excellent reviewer due to the similarity of remake, but I understand if a potential conflict of interest precludes his participation.
I just ran goodpractice::gp() on wlandau-lilly/drake@ee475f103f514905d3ed9c3a5dd7b2cacbc46021:
It is good practice to
x avoid the attach() and detach() functions, they are
fragile and code that uses them will probably break sooner than
later.
tests/testthat/test-Makefile.R:165:3
tests/testthat/test-Makefile.R:167:3
tests/testthat/test-Makefile.R:192:3
tests/testthat/test-Makefile.R:194:3
x avoid calling setwd(), it changes the global environment.
If you need it, consider using on.exit() to restore the working
directory.
tests/testthat/test-cache.R:43:3
tests/testthat/test-cache.R:153:3
tests/testthat/test-cache.R:216:3
tests/testthat/test-cache.R:225:3
tests/testthat/test-cache.R:241:3
I can explain these idiosyncrasies.
detach()
in tests/testthat/test-Makefile.R
Drake
promises to load the user's packages, which is especially important for distributed computing across multiple nodes on a cluster. To test, I occasionally need to call detach()
to remove packages from search()
. Unfortunately, unloadNamespace()
does not have the desired effect.
setwd()
in tests/testthat/test-cache.R
By default, drake
searches through parent directories to find the current drake
project's storr cache. To test, I need to change directories. But rest assured: every test is wrapped in a call to test_with_dir(), which uses withr::with_dir()
to ensure that the original working directory is restored. Nested calls to withr::with_dir()
give me errors.
Thanks for your submission @wlandau-lilly. Running goodpractice::gp()
is actually my role but now we have your comments (and I get the same flags) so all is good. :wink:
devtool::spell_check
identified:
I'm now looking for reviewers.
Reviewers: @jules32 @benmarwick @gothub Due date: 2017-01-04
@wlandau-lilly I forgot to mention you can now add this review badge to the README
[![](https://badges.ropensci.org/156_status.svg)](https://github.com/ropensci/onboarding/issues/156)
Thanks, @maelle! I appreciate your forgiveness regarding goodpractice::gp()
, and I just fixed the spelling mistake in wlandau-lilly/drake@5c9388a1a7873277332de26a3f8dc0de5bd94104. I will add the badge soon, and I am excited for the review process!
@wlandau-lilly, good news, the reviewers are now assigned!
@jules32 and @benmarwick thanks a lot for accepting to review this package! 😸 Your reviews are due on the 2017-12-04.
Yes, thank you @jules32 and @benmarwick! I look forward to your feedback.
@jules32 and @benmarwick friendly reminder that your reviews are due on the 2017-12-04 😉
@jules32 and @benmarwick, could we touch base about timing? Drake is large and developing fast, so I do understand that reviews may be more difficult than is typical.
I forgot to update the thread when @jules32 contacted me to say she'd get the review in before Dec the 11th, sorry.
@benmarwick any update?
Thanks to both reviewers and thanks @wlandau-lilly for your patience. :-)
Hi @wlandau-lilly et al,
I am going to get started on Thursday and this weekend since I was out of the office last week. Looking forward to getting to know this package!
On Tue, Dec 5, 2017 at 7:53 AM, Maëlle Salmon notifications@github.com wrote:
I forgot to update the thread when @jules32 https://github.com/jules32 contacted me to say she'd get the review in before Dec the 11th, sorry.
@benmarwick https://github.com/benmarwick any update?
Thanks to both reviewers and thanks @wlandau-lilly https://github.com/wlandau-lilly for your patience. :-)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/156#issuecomment-349347198, or mute the thread https://github.com/notifications/unsubscribe-auth/AFnnRe3X5gCGWqmWFexnMLySaL7_lLGRks5s9WbhgaJpZM4QYbv- .
--
Julia Stewart Lowndes, PhD Ocean Health Index National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara (UCSB) website http://jules32.github.io/ • ohi-science http://ohi-science.org/ • github https://github.com/jules32 • twitter https://twitter.com/juliesquid
Thanks for the reminders, here's my review:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
p = partial x = complete
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
BM: Could CONTRIBUTING.md be located at the top level of the repo for better visibility? For packages co-submitting to JOSS
- [p] The package has an obvious research application according to JOSS's definition BM: I see that you do not plan to submit to JOSS at the moment, so this is just an incidental comment: It is easy to imagine research applications for drake, it is a very solid contribution to an active area of workflow and provenance tracking tools. However, the research application would be more obvious if the readme referred to some actual real-world uses of the package. For example, a list of domain-specific research project repos where drake is used (i.e. by biologists, economists, whatever), or a list of publications reporting results that were generated or enabled using drake. Currently it looks like drake has great potential, but hasn't actually been used in any real-world applications. Perhaps it has, but it's not clear from the pkg docs. Examples of use would help potential users better understand how drake can help them.
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 2
@benmarwick, your advice is exactly what I needed! Even before the inception of drake
, I have had a strong ambition to challenge the R community's current conventions around reproducible data analysis and research. I want to make drake
more relatable, understandable, usable, and widespread, and it will be gratifying to elaborate on the practical niche. Thank you for steering me back on course.
I have already started working on the changes you requested. I expect to have ample time next week, but I will be on vacation from December 16 through January 3, and I will be totally off the grid and unreachable from December 25 through January 3. I will eagerly resume work on January 4.
It was gratifying to work on this response, and the changes to drake
were straightforward. For the past year, I have been struggling to find the best way to talk about drake
. Your feedback allowed me to make substantial progress.
It could be more clear what drake does that no other package already does, Or why a user should use drake rather than make/remake/code chunks in Rmd/etc.
I agree. Please see the new "Similar work" section of the README (as well as the end of the new application.Rmd vignette). I now compare drake
to Make, remake, and knitr.
The four points in 'similar work' section of the ropensci submission should also be in the readme.
Done. I added these points and expanded on them in the README.
That said, some of these are a matter of style, e.g. YAML vs data frames for config, rather that outright novel function. Are there any projects using drake that a user can inspect to see real-world applications? Publications that can be cited?
I have tried to search for real-world examples of drake
in the wild, but I have not had success so far. I think it may be too early to see publicly-released projects and publications that use drake
. However, I do know that @kendonB, @dapperjapper, and @AlexAxthelm are heavily using drake
for their projects. I cannot share any of my own because they are all company confidential.
In the new "Similar work" section of the README, I now refer to real-world applications of Make and remake, and I argue that drake
improves on both these tools for R users. I hope that helps. In the coming years, I will continue to search for publicly-available drake
-powered projects, and I will keep a running list in the README.
My sense is that many packages aimed at workflow management or improving reproducibility are quite idiomatic. This makes it hard for the common-or-garden variety R-user to see how they fit into their own ways of working. If you can help to bridge that gap between your idioms and the average user, that would make this pkg much more useful and valuable to the community.
I completely agree! I have been struggling with communication this entire time. The first three sections of the current README and drake.Rmd vignette are new, and I think they are substantial improvements. I try to introduce drake
using plain language, and I argue that it makes life easier.
I had no problem running these. However, I found the quickstart and examples difficult to relate to. For example, who writes their Rmd report in the console and passes it to an object as a character vector? That seems unnatural, to me at least, where the Rmd file is my main notebook and workbench. It would be easier to follow if the substance of the analysis was narrated in a little more detail. Perhaps a tiny actual research question with actual data would make this example more accessible? Perhaps also a comparison with a simple makefile to show makefile users (the main audience for this pkg) how to accomplish the same with drake (and why drake would be preferable). This would help a reader see how their existing workflow could be translated to the drake system. The drake system is a very comprehensive universe of functions, and new users will need a bit more guidance to see analogues between what drake does and what they're already using.
Absolutely! I was too entrenched in the details to realize this. I have added a new application.Rmd vignette for exactly this purpose, and I paired it with example code files that the user can generate with drake_example("application")
. Here, I define a research question and use real data to address it. I also comment on how Make would be unwise for that particular use case.
Regarding the use of *.Rmd
reports and knitr
, please see the new knitr subsection of the README.
Could CONTRIBUTING.md be located at the top level of the repo for better visibility?
Done.
I see that you do not plan to submit to JOSS at the moment, so this is just an incidental comment: It is easy to imagine research applications for drake, it is a very solid contribution to an active area of workflow and provenance tracking tools. However, the research application would be more obvious if the readme referred to some actual real-world uses of the package. For example, a list of domain-specific research project repos where drake is used (i.e. by biologists, economists, whatever), or a list of publications reporting results that were generated or enabled using drake. Currently it looks like drake has great potential, but hasn't actually been used in any real-world applications. Perhaps it has, but it's not clear from the pkg docs. Examples of use would help potential users better understand how drake can help them.
I absolutely do plan to submit to JOSS in the future. Now is not the right time for me, however, and I am especially glad I received your feedback on the package itself first. And as I mentioned before, I am currently having trouble finding real-world applications of drake
in the wild. I will continue searching, and I will gather and list them in the README when I see them.
Now I have a question: @maelle, may I return to this thread at a later date to fast-track a JOSS submission?
Thanks for your review, @benmarwick! Are you happy with the changes?
@wlandau-lilly, thanks for answering @benmarwick's review promptly. Three points from me:
The JOSS submission would only need a paper.md and archived version of the repository. We do not need that before the end of onboarding, which will be at a later date. 😉
Why not create a website for the package using pkgdown
? It'll make the vignettes easier to browse. See http://enpiar.com/2017/11/21/getting-down-with-pkgdown/
Very naive question, does each command need to be something as basic as summary
or could it be sourcing a larger script containing several regression calls?
Another argument in favour of pkgdown
: you could create a grouping of functions as in this example which is what you have in your README now but without the documentation of each function accessible by one click.
I've also just noticed this phrasing "Most people think that means". Even if you have data underlying this, I think it looks a bit agressive here, maybe replace it "It does not only mean". :-)
@maelle,
The JOSS submission would only need a paper.md and archived version of the repository. We do not need that before the end of onboarding, which will be at a later date. :wink:
Very much appreciated!
Why not create a website for the package using pkgdown? It'll make the vignettes easier to browse. See http://enpiar.com/2017/11/21/getting-down-with-pkgdown/
Good suggestion. Drake
heavily relies on its vignettes, and pkgdown
is a community standard for documentation. I expect to begin work on this soon.
Very naive question, does each command need to be something as basic as summary or could it be sourcing a larger script containing several regression calls?
I am glad you asked! Drake
commands can be arbitrary R code (although I would avoid unquoted formulas because they may throw off the static code analysis that detects dependencies. This would not break make()
, but it may create false positive messages about missing import objects or link spurious imported dependencies). So yes, you could have a large script containing several regression calls separated by ;
or \n
. This is yet another advantage over remake, which requires all commands to be single function calls with no nesting (except for I()
, which declares string literals).
Inside drake
, each command is wrapped in a protective function call in order to quarantine the side effects, so in general, only the return value of the code block should have an effect on the rest of make()
(see wlandau-lilly/drake#39).
Large commands are not always good practice because they can make workflow plan data frames difficult to print properly. (Full disclosure: gather_plan()
creates super long commands, so I am guilty.) I remember submitting a feature request to tibble to allow individual columns to be truncated, but I cannot seem to find the issue.
Every project needs a balance between having too many targets and assigning too much work to any individual target. The new application.Rmd vignette implicitly hints at a possible explosion in the number of targets for massive studies with crushing combinatorics. There is no one-size-fits-all solution.
I've also just noticed this phrasing "Most people think that means". Even if you have data underlying this, I think it looks a bit agressive here, maybe replace it "It does not only mean". :-)
You are right. In wlandau-lilly/drake@6bcfca7a12aafdcdb50cf5bb11904a8c3eaaac52, I just changed the sentence to "The R community likes to emphasize reproducibility, which one could interpret to mean..."
FYI: the pkgdown site is now live. I love how it shows the vignettes!
Cool! I also like the grouping in the refrence! Some suggestions from me as a naive user (I am being a bit of a reviewer here, but feedback is feedback 😉):
could you make the order of groups, and of vignettes, from most important to least important/most complex (e.g. not starting with caching) instead of alphabetical?
could you make the readme more minimal? To be honest I find it overwhelming bc it is so dense, now with the site in place you can shorten it a bit. "Where to begin" and "handy functions" can disappear in favor of saying something like "drake has a documentation website. you can find a quick start in the quickstart vignette and more specific details about aspects such as parallel computing in the different articles listed" etc.
I know the title of the readme is the origin of the name but it does not describe your package very well for newcomers. How about drake, a package to ensure reproducibility while saving you time?
Now to help you present drake to naive users 😀 I think you should start with a "why use drake" section with content from the 2 first sections and less code to convey the big message before code (convince newcomers at a glance). I know what reproducibility is (hopefully 😀) but I could choose not to ever learn a new tool and have a makefile.R which is a script with source calls to other scripts and knitting in the right order. I world re run the entire thing if the data change. This is how I would present internal consistency in the beginning of the readme. You can write why it is worth taking the time to learn drake (because ultimately potential users would need to make that decision and this while feeling too busy and/or not expert enough to learn a new tool): saving time in the future by not re running everything from scratch, by being able to use high performance computing (link to vignette), not too much learning time or frustration because great docs, and why drake vs other make tools (link to the related work) section). Really, phrasing the readme in a short way with these arguments is IMO a good marketing strategy because more experienced users of make like tools can just scroll down to related work while you catch beginners interest. I do not use any such tool yet and this is how I'd choose to stay on this website. User-friendliness/beginner-friendliness.
Hope this helps while waiting for the second review which might be postponed a bit. I think this is also consistent with what @benmarwick said.
Also ask the three current users you mentioned how they got introduced to the package but I imagine it was by discussing it with you since they are in the acknowledgements.
And thanks for your prompt answer to all feedback until now!
Yes, the more feedback like this, the better! Super helpful!
could you make the order of groups, and of vignettes, from most important to least important/most complex (e.g. not starting with caching) instead of alphabetical?
Done.
could you make the readme more minimal? To be honest I find it overwhelming bc it is so dense, now with the site in place you can shorten it a bit. "Where to begin" and "handy functions" can disappear in favor of saying something like "drake has a documentation website. you can find a quick start in the quickstart vignette and more specific details about aspects such as parallel computing in the different articles listed" etc.
I removed the "Where to begin" and "handy functions" sections, and I explained the pkgdown
site in the "Documentation" section.
I know the title of the readme is the origin of the name but it does not describe your package very well for newcomers. How about drake, a package to ensure reproducibility while saving you time?
Yes. I changed the title to "drake: stay reproducible and save time".
Now to help you present drake to naive users :grinning: I think you should start with a "why use drake" section with content from the 2 first sections and less code to convey the big message before code (convince newcomers at a glance). I know what reproducibility is (hopefully :grinning:) but I could choose not to ever learn a new tool and have a makefile.R which is a script with source calls to other scripts and knitting in the right order. I world re run the entire thing if the data change. This is how I would present internal consistency in the beginning of the readme. You can write why it is worth taking the time to learn drake (because ultimately potential users would need to make that decision and this while feeling too busy and/or not expert enough to learn a new tool): saving time in the future by not re running everything from scratch, by being able to use high performance computing (link to vignette), not too much learning time or frustration because great docs, and why drake vs other make tools (link to the related work) section). Really, phrasing the readme in a short way with these arguments is IMO a good marketing strategy because more experienced users of make like tools can just scroll down to related work while you catch beginners interest. I do not use any such tool yet and this is how I'd choose to stay on this website. User-friendliness/beginner-friendliness.
Please see the top of the new README. I added a "why use drake" section and kept the subsequent three sections the same. The README has changed so fast over the past few months that I forgot I no longer had an abstract-like overview at the top.
Also ask the three current users you mentioned how they got introduced to the package but I imagine it was by discussing it with you since they are in the acknowledgements.
drake
's first ever pull request. How did you find out about the package?drake
from our Indy useR Meetup?drake
. Is that how you found out?Thanks! Illnow be travelling until Wednesday so will have a look then. ☺
Hi @wlandau-lilly et al,
Sorry for the delay here. I'm based in Santa Barbara, California, and with the huge wildfire that is ongoing, we've left town in the last few days.
I did have a look, and had a lot of the same thoughts as @benmarwick in trying to understand how
the intended users ofdrake
would know that it was the right tool for them. I know you are addressing some of Ben's comments before going on holiday; I'll have more comments for you in the New Year too.
Cheers, Julie
Julie, I am sorry to hear that the fire came your way. I hope you, your family, and your friends are all safe and comfortable. Your feedback can wait as long as it needs to. Please be well.
Thanks so much!
Thanks @jules32!
I assigned a 3d reviewer, @jeroen, to have a look at the implementation, not the interface. ☺
Welcome, @jeroen. @maelle, this means we have a new timeline, right?
Yes! After discussing with @jeroen and @jules32 the new deadline is Jan the 4th after your vacation. Sorry about the process length but it'll have been worth it I think with such a reviewers dream team! 🦄🦄🦄
Will you soon have time for JOSS paper.md? See their instructions, it's really a short paper.
That works for me. I will return refreshed and ready to respond.
I just read http://joss.theoj.org/about#author_guidelines, and it turns out that I completely misunderstood JOSS! I assumed that I would need to write a full-length journal article and that the expectations and process would be similar to JSS, etc.
I think a JOSS submission should be possible early next year, but it will take some time. Given all the great feedback I am about to receive through rOpenSci, I would rather wait. The paper.md
will be quick to write, but my company requires a disclosure process for official academic journals, and there is not enough time left in the year to initiate a new disclosure. Also, each iteration of paper.md
will need to be reviewed and approved all over again. But I can minimize the bureaucratic red tape if I am overprepared.
Ok great! I had a feeling you thought it was a long article. Have a great vacation!
I have been thinking more about drake
's accessibility to new users, especially @benmarwick's comment that it is difficult to relate to the vignettes. I did some expanding and refactoring, and as of now, two of the vignettes concentrate on more down-to-earth examples. Both run quickly to avoid bottlenecking the package quality checks, and the statistical methodology is elementary to keep things clear and simple. Each is of these vignettes is paired with a set of example code files (available via drake_example("packages")
and drake_example("gsp")
).
drake
brings the project up to date without restarting everything from scratch.drake
easily scales up with the number of targets but GNU Make does not.Cool -- I was also wondering (and have not checked myself sorry) whether the R podcast episode about drake
is listed in the documentation? It provides some useful context&history.
Good question, @maelle. I have not mentioned it in the documentation, and I am still trying to decide whether I will.
FYI: I just mentioned the podcast episode in the documentation section of the README (see the commit referenced above).
:wave: @jeroen and @jules32, friendly reminder that your review is due on Jan the 4th.😺
😅
I'm asking some help from @HenrikBengtsson to review the parallelism components.
Sorry folks I'm not going to make the deadline. Can we push it back 1 month? 😯
Hi All!
Sorry for the silence over break! I've just had the chance to review drake, but it is just a partial review at this point, partly because I'm approaching the 3 hour mark, and partly because I am getting an error installing the current version from GitHub (note: I know this is because you've been working on it in my silence! You are probably already fixing it, but the error is below).
My partial review is focused on the README and website, which have greatly improved following @benmarwick's comments and how @wlandau-lilly addressed them. I will put my suggestions in the following comment. I can plan to coordinate further edits with on @jeron's timeline if that's easiest.
In December (for the original deadline) I did begin to look through drake v4.4.1.9000, and installed it from GitHub with no problem.
However, tonight, January 3 I installed devtools::install_github("wlandau-lilly/drake", build = TRUE)
, which I believe is 4.4.1.9002 (not yet a GitHub release). I got the following install error due to Ecdat
:
...
* preparing ‘drake’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 17-25 (example-gsp.Rmd)
Error: processing vignette 'example-gsp.Rmd' failed with diagnostics:
there is no package called 'Ecdat'
Execution halted
Installation failed: Command failed (1)
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
JL: I am reviewing drake
as an interface reviewer, and from a beginner-friendly angle. I played around with drake
prior to @benmarwick's comments above, and have also followed the work that @wlandau-lilly has put towards addressing them.
This review is in progress, as I've just focused on the documentation for now.
The package includes all the following forms of documentation:
make
and drake_config
, but that's after a lot of other drake magic has gone on behind the scenes. Would it be possible/desirable to make a list (and linking to the website's reference page)? I know you're trying to cut down the README so some of this could go on the website's Get Started page perhaps. # install drake from CRAN
install.packages("drake")
# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("wlandau-lilly/drake", build = TRUE)
JL: the following review is to be completed with drake v. 4.4.1.9002 or greater
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
JL: from the above thread it seems like this is in progress so I will wait to evaluate this
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
JL: the following review is to be completed with drake v. 4.4.1.9002 or greater
Estimated hours spent reviewing: 3.5
In December, I was running drake
on my machine, and had no problems running the examples. But I had trouble seeing how to move past the examples to see how drake could be used from the ground up, and how *I* would use it. @wlandau-lilly has since made this a lot more clear, and I can see how this would be a good tool for more beginner-types to know about. These comments are kind of fine-tuning some of the work you've already done to help it resonate.
README suggestions
What gets done stays done
Having the Sisyphean loop example 1-4 is great: it's really helpful. It seems that with drake
, this turns into something like:
Is that true, and would that we worth itemizing like that in the README? And then here are some suggestions for commenting the example, which is a bit obvious but can be a bit easier to follow:
Example: `my_plan` lists 15 targets (analysis steps that have specific commands), and `drake` will evaluate them with its `make` function.
# Load drake's basic example and examine my_plan's analysis
library(drake) # install.packages("drake")
load_basic_example(verbose = FALSE)
head(my_plan)
## target command
## 1 'report.md' knit('report.Rmd', quiet = TRUE)
## 2 small simulate(5)
## 3 large simulate(50)
## 4 regression1_small reg1(small)
## 5 regression1_large reg1(large)
## 6 regression2_small reg2(small)
# First round: drake builds all 15 targets.
make(my_plan)
## target large
## target small
## target regression1_large
## target regression1_small
## target regression2_large
## target regression2_small
## target coef_regression1_large
## target coef_regression1_small
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression1_large
## target summ_regression1_small
## target summ_regression2_large
## target summ_regression2_small
## target 'report.md'
# Then, you change the reg2 function; this will affect all regression2 targets.
reg2 <- function(d){
d$x4 <- d$x ^ 4
lm(y ~ x4, data = d)
}
# Second round: drake only builds what was updated.
make(my_plan)
## target regression2_large
## target regression2_small
## target coef_regression2_large
## target coef_regression2_small
## target summ_regression2_large
## target summ_regression2_small
## target 'report.md'
# And if nothing was updated, drake doesn't try to rebuild.
make(my_plan)
## All targets are already up to date.
website: wlandau-lilly.github.io/drake
Mostly, these are small things that might be fixed with v. 4.4.1.9002 or greater.
Something is wrong on the Reference page; instead of a short descriptor of the function, it repeats the function name after the word "Function". This was also the case in the R help for v. 4.4.1.9000.
Get Started page:
load_basic_example(verbose = FALSE); my_plan
on this page. Seeing inside the my_plan
variable is when I could really see myself using drake in my own workflow. It also lets us see reg2 before the example's second round. The packages example also really hit drake home for me. I like seeing drake_plan()
being used to to assign the targets and commands that we've seen in load_basic_example()
.
Small thing, but when I look at the website on Chrome, in the tab it is labeled "Data Frames in R for Make — Drake". I know that that is the origin of the name, but I agree with @maelle's comment above that it's not intuitive (especially as a person who has too many tabs open all the time). If it is possible to have it say "drake" that would be awesome.
Happy New Year everyone in this thread!
@jules32 Thanks a lot for your review!
I'm trying to find a technical reviewer who'd have time to review drake
before Jeroen's available again, I'll update this thread once I know more. Sorry for the long process, @wlandau-lilly , and thanks again for your work on the package since the submission!
Thank you, @jules32! I am eager to address your comments, and I expect to have a proper response within the next couple days. For now, I will comment on the installation issue you mentioned, which I believe I addressed just now via https://github.com/wlandau-lilly/drake/commit/16a4ea5dcd8017dbe2c756e3e4d301565f3fd4ee. The Ecdat package is only required for a vignette, so I listed it to the Suggests:
field of the DESCRIPTION. Rather than move it to Imports:
or Depends:
, I simply removed build = TRUE
from the call to install_github()
. (In hindsight, build = TRUE
seems excessive anyway.) I also added special instructions for building the vignettes.
- If you want to build the vignettes when you install the development version, you must
- Install all the packages in the
Suggests:
field of the DESCRIPTION file, including cranlogs and Ecdat. All these packages are available through the Comprehensive R Archive Network (CRAN), and you can install them withinstall.packages()
.- Set the
build
argument toTRUE
ininstall_github()
.
@wlandau-lilly I've contacted several potential technical reviewers to see if they could review this package rapidly, without success which is maybe not surprising at this time of the year after the holidays. A non rapid review would be the usual 3 weeks which is not much shorter than one month. I therefore propose we wait for @jeroen's review, with a new due date, 2018-02-04. I'm very sorry about that!
@maelle I was hoping to complete this before rstudio::conf(2018), but I guess it can't be helped. And I realize that the winter holidays are not the right time to do work. Thank you for trying.
I think you may have notified me accidentally. I am not involved in this project.
Sorry, @jeronjacob. Feel free to unsubscribe from this thread.
Your feedback and advice are extremely helpful, and your encouragement is gratifying. I want to reach as many new users as possible, so I care a lot about outside feedback on the documentation. Thank you for your efforts.
I agree with all your suggestions from January 3, and I believe I addressed them all in https://github.com/wlandau-lilly/drake/commit/16a4ea5dcd8017dbe2c756e3e4d301565f3fd4ee through https://github.com/wlandau-lilly/drake/commit/f76f6e2127ac8003a8c5417a667ae9b9141ae15a. Please let me know if you think I missed anything.
A big question I'm still left with after the README and the (super-helpful) website is what the functions that users will use as they get started, and over and over again. We have seen make and drake_config, but that's after a lot of other drake magic has gone on behind the scenes. Would it be possible/desirable to make a list (and linking to the website's reference page)? I know you're trying to cut down the README so some of this could go on the website's Get Started page perhaps.
The Documentation section of the README now includes a list of the 10 most important functions, given roughly in the order I expect a user to call them, and it also refers to the reference section of the documentation website. The documentation website's main page has an identical section.
JL: I don't think it's necessary to include options to install different tags/releases from GitHub. I think if a user wants that, they'll know where to look.
Done. Like the rest of the changes to the README, this change is also reflected on the documentation website.
For packages co-submitting to JOSS
JL: from the above thread it seems like this is in progress so I will wait to evaluate this
I have just begun my company's scientific disclosure process to release my JOSS manuscript. It is essentially the "Why use drake?" section of the README with some added metadata and references.
Having the Sisyphean loop example 1-4 is great: it's really helpful. It seems that with drake, this turns into something like:
- Launch the code
- Drake evaluates and rebuilds anything that has changed since the last run through
Is that true, and would that we worth itemizing like that in the README?
Yes, absolutely. The continuity of the format highlights the contrast between approaches. I made the change.
And then here are some suggestions for commenting the example, which is a bit obvious but can be a bit easier to follow: ...
Narration is definitely helpful here. I have added very similar comments to the code.
Something is wrong on the Reference page; instead of a short descriptor of the function, it repeats the function name after the word "Function". This was also the case in the R help for v. 4.4.1.9000.
That was just my own laziness. I went back and changed all the titles to be informative. The new reference page and help files are now improved.
Get Started page:
- I'd suggest starting this page with the "Where to Begin" part since the rest of it is on the homepage.
Come to think of it, the "Where to begin" section is the only unique part of the Get Started page. I have now removed all the sections that were repeated on the main page and README. You can also see the changes in the underlying drake.Rmd vignette.
Small thing, but when I look at the website on Chrome, in the tab it is labeled "Data Frames in R for Make — Drake". I know that that is the origin of the name, but I agree with @maelle's comment above that it's not intuitive (especially as a person who has too many tabs open all the time). If it is possible to have it say "drake" that would be awesome.
Done. The title you saw was automatically generated by pkgdown. The change required some more post-processing, but I agree that it was necessary.
FYI: I just added the JOSS submission docs in https://github.com/wlandau-lilly/drake/commit/1c9c67b6495330d5674315bc019e5473d0e7a4ab. You can compile the pdf with pandoc --bibliography paper.bib paper.md -o paper.pdf
. Since a DOI for a GitHub repo needs a tag/release, I will wait to generate one for drake
until all the reviewers here approve the rest of the package.
Also, I would like the DOI to correspond to a version of drake
whose logo and links refer to https://github.com/ropensci/drake (see also https://github.com/wlandau-lilly/drake/pull/176) and the CRAN release of version 5.0.0.
Summary
The drake package is an R-focused pipeline toolkit. It reproducibly brings results up to date and automatically arranges computations into successive parallelizable stages. It has a Tidyverse-friendly front-end, powerful interactive visuals, and a vast arsenal of multicore and distributed computing backends.
URL: https://github.com/wlandau-lilly/drake
Fit: drake falls easily within reproducibility and high-performance computing.
Target audience: anyone who uses R for medium-to-long computations for which the results need to stay up to date with the dependencies.
Similar work
Remake
Drake overlaps with its direct predecessor, remake. In fact, drake owes its core ideas to remake and @richfitz, and explicit acknowledgements are in the documentation. However, drake surpasses remake in several important ways, including but not limited to the following.
drake::example_drake()
.Factual's drake
Factual's drake is similar in concept, but the development effort is completely unrelated to the R package of the same name.
Other pipeline toolkits
There are many other successful pipeline toolkits, and the drake package distinguishes itself with its R-focused approach, Tidyverse-friendly interface, and parallel computing flexibility.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.I plan to submit to JOSS in the future, but the manuscript is not currently ready.
Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings: