Running minimal code with changed inputs

OmaymaS commented 6 years ago

I am not sure if this topic was tackled from all sides but thought about sharing.

Problem When one runs a script, there could be a computationally heavy parts, but not all these parts require re-running. The inputs to these parts could be the same, the imported data might not have changed since the last time, etc.

In Rmarkdown, one can use caching in chunks to save some data importing. And in scripts one could write some conditions to control running some code.

It would be more efficient if there's a simple way to detect changes and decide which parts to be re-run automatically

So if there are available solutions for this, let me know. If not we might think about certain functions or settings to help with this.

maelle commented 6 years ago

I think https://github.com/ropensci/drake could help with this?

wlandau commented 6 years ago

Yes, I designed drake explicitly for this purpose. See drake's main example for a proof of both concept and implementation.

It is so gratifying to see people independently discover the need for such a tool.

batpigandme commented 6 years ago

In lieu of the unconf, we'll be having a two-day drake intensive! 😂 Seriously though, @wlandau, you should just post a list of all the things people have suggested for which drake provides a solution!

wlandau commented 6 years ago

Thanks! It's exciting how many of them are in this issue tracker alone.

I cannot physically be in Seattle on May 21-22, but I will do my absolute best to be present online.

noamross commented 6 years ago

As much as I like drake, I see a space for a solution here is a little more lightweight. Almost all R build systems force users to refactor their code away from scripts to functions. There are good reasons for this but it adds a lot of overhead for people who are working with scripts.

I think there could be a useful solution here wherein one does cacheing on an R script with a function like source_cached(). In this solution, one would treat every line of code in the script like a knitr chunk. Then one could either

Hash and save the script's environment after every line (high storage), or
Hash every and save every object in the environment after every line (more overhead, less storage)

One could probably use storr::driver_envionment() for fast in-session storage, or a regular on-disk storr.

This wouldn't be the most efficient solution but it would probably work very well for the case of a script that has gotten a bit unwieldy which someone is developing through interactive use.

Since this is meant to be a convenience tool, one would probably throw in an RStudio add-in, so one could run source_cached() on the current script with a hotkey.

wlandau commented 6 years ago

@noamross I think source_cached() is absolutely possible, maybe even straightforward, with memoise + CodeDepends. And I agree, its bare simplicity would fill a niche.

wlandau commented 6 years ago

At one point, I thought maybe there could be an automatic way to turn an arbitrary R script into a drake_plan(), but this turns out to be extremely difficult to even think about.

OmaymaS commented 6 years ago

Summary: Track changes in R script and run miminal code accordingly. This issue seems to be covered by current solutions like drake according to the replies.

ropensci / unconf18

Running minimal code with changed inputs #53