jdiaz4302 / ds-pipelines-targets-1

https://lab.github.com/USGS-R/intro-to-targets-pipelines
0 stars 0 forks source link

Why use a dependency manager? #5

Closed github-learning-lab[bot] closed 3 years ago

github-learning-lab[bot] commented 3 years ago

We're asking everyone to invest in the concepts of reproducibility and efficiency of reproducibility, both of which are enabled via dependency management systems such as remake, scipiper, drake, and targets.

Background

We hope that the case for reproducibility is clear - we work for a science agency, and science that can't be reproduced does little to advance knowledge or trust.

But, the investment in efficiency of reproducibility is harder to boil down into a zingy one-liner. Many of us have embraced this need because we have been bitten by issues in our real-world collaborations, and found that data science practices and a reproducibility culture offer great solutions. Karl Broman is an advocate for reproducibility in science and is faculty at UW Madison. He has given many talks on the subject and we're going to ask you to watch part of one of them so you can be exposed to some of Karl's science challenges and solutions. Karl will be talking about GNU make, which is the inspiration for almost every modern dependency tool that we can think of. Click on the image to kick off the video.

reproducible workflows with make

:computer: Activity: Watch the above video on make and reproducible workflows up until the 11 minute mark (you are welcome to watch more)

Use a GitHub comment on this issue to let us know what you thought was interesting about these pipeline concepts using no more than 300 words.


I'll respond once I spot your comment (refresh if you don't hear from me right away).

jdiaz4302 commented 3 years ago

This video is mostly about tools and concepts that I have already read or heard about either earlier in this training or in previous experience, but I appreciate the context of the video; that it appears that this scientist was given space among scientific talks to make a case and provide resources for these open science practices. Out of the 7 topics he touched on (starting at about 6:45), I would say that I am least familiar with the GNU Make-like tools (never used) and licensing the code (no strong understanding to make the best license choice), so I appreciate that he provides a further-reading link that gives those topics more time. While there's still a lot to learn beyond that reading it was nice to have a first introduction to Makefiles and to have a better appreciation of how this can inspire you to fully automate your workflow, making it even more reproducible. Likewise, it was good to learn that technically code without a license is not to be used by others and that certain licenses do not protect yourself from others' uses of your code.

github-learning-lab[bot] commented 3 years ago

Great comments @jdiaz4302! :sparkles:

You could consider GNU make to be a great grandparent of the packages we referred to early in this lesson (remake, scipiper, drake, and targets). Will Landau, the lead developer of targets, has added a lot of useful features to dependency management systems in R, and has a great way of summarizing why we put energy into using these tools: "Skip the work you don't need"

We'd like you to next check out a short part of Will's video on targets

reproducible workflows with R targets

:tv: Activity: watch video on targets from at least 7:20 to 11:05 (you are welcome to watch the full talk if you'd like)

Use a github comment on this issue to let us know what contrasts you identified between solutions in make and what is offered in R-specific tools, like targets. Please use less than 300 words. Then assign your onboarding cohort team member this issue to read what you wrote and respond with any questions or comments.


When you are satisfied with the discussion, you can close this issue and I'll direct you to the next one.

jdiaz4302 commented 3 years ago

Using the specified time stamps, it sounds like make and targets do many things similar with targets being inspired by make which uses some template/Makefile to automatically track a workflow/pipeline, know what steps need to be reran, and perform those runs. It is my understanding that targets may be better suited for our line of work because it is highly integrated into R/Rstudio; this meaning that you can manage and perform these new capabilities within your same working interface of R and that outputs along the way can also be easily called as R objects. Skipping ahead through the video, it appears that there are also some nice visual tools such as producing a graph of the pipeline indicating its current status and (this was less clear to me but...) it appears that it may also facilitate parallel computations for some of your more costly code.

lindsayplatt commented 3 years ago

Great observations. I've been personally drawn to targets as an R user and R instructor. I can see more people adopting pipelining approaches when it is learning a new R package vs adopting an entirely new language (as with make and the world of yaml files).

I am also excited about the parallel computing capabilities built into targets, but must admit that I haven't yet tried them!

github-learning-lab[bot] commented 3 years ago


When you are done poking around, check out the next issue.