jesse-ross / ds-pipelines-targets-1

https://lab.github.com/USGS-R/intro-to-targets-pipelines
0 stars 0 forks source link

Why use a dependency manager? #5

Closed github-learning-lab[bot] closed 3 years ago

github-learning-lab[bot] commented 3 years ago

We're asking everyone to invest in the concepts of reproducibility and efficiency of reproducibility, both of which are enabled via dependency management systems such as remake, scipiper, drake, and targets.

Background

We hope that the case for reproducibility is clear - we work for a science agency, and science that can't be reproduced does little to advance knowledge or trust.

But, the investment in efficiency of reproducibility is harder to boil down into a zingy one-liner. Many of us have embraced this need because we have been bitten by issues in our real-world collaborations, and found that data science practices and a reproducibility culture offer great solutions. Karl Broman is an advocate for reproducibility in science and is faculty at UW Madison. He has given many talks on the subject and we're going to ask you to watch part of one of them so you can be exposed to some of Karl's science challenges and solutions. Karl will be talking about GNU make, which is the inspiration for almost every modern dependency tool that we can think of. Click on the image to kick off the video.

reproducible workflows with make

:computer: Activity: Watch the above video on make and reproducible workflows up until the 11 minute mark (you are welcome to watch more)

Use a GitHub comment on this issue to let us know what you thought was interesting about these pipeline concepts using no more than 300 words.


I'll respond once I spot your comment (refresh if you don't hear from me right away).

jesse-ross commented 3 years ago

targets is specifically designed for data science / analysis workflows in R, whereas make is a general tool for automating series of commands which might change or have their inputs change. Landau also seems to be saying that targets has abstractions for intermediate data products, and that it encourages a functional programming style.

hcorson-dosch-usgs commented 3 years ago

Yes, exactly. One of the coolest capabilities of targets as a tool designed for R workflows is how it tracks functions. With make, you specify the file (e.g., .R files or a data file) that a target depends on, and make tracks the file. If the timestamp of a .R file has changed, make considers downstream targets that depend on that file to be out of date, even if the function steps defined within that file haven't changed. targets (like other R-specific pipelining tools) goes one step further, and actually tracks the operations of each function defined within a given .R file. It only marks targets that use a given function as 'out of date' if the actual operations of that function have changed. It ignores any edits to comments, formatting, or whitespace. So that means if you go back into a script to add comments or parameter definitions, it won't trigger a rebuild! 🎉

jesse-ross commented 3 years ago

That's pretty great!

github-learning-lab[bot] commented 3 years ago


When you are done poking around, check out the next issue.