ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

Question: Recursive memoization #1335

Closed kdkavanagh closed 3 years ago

kdkavanagh commented 3 years ago

Question

I've been using Drake for a while now and have started to dig into the internals to try to understand how it's working. One piece I'm stuck on at the moment is how recursive memoization is designed. The docs note that Drake differs from memoize in that it will recurse through functions dependencies to ensure they're tracked in addition to just the immediate function body + args.

From playing around with drake_deps and deps_code, I can see how individual functions are evaluated, but it's not clear to me how functions are recursively hashed. Example below. I'm namely curious how drake knows to rebuild a target g if the global y changes:

f=function(x) x + y 

g = function(x) f(x)

> drake:::drake_deps(g)
drake_deps
 $ globals   : chr "f"  #No 'y' here?
 $ namespaced: chr(0) 
 $ strings   : chr(0) 
 $ loadd     : chr(0) 
 $ readd     : chr(0) 
 $ file_in   : chr(0) 
 $ file_out  : chr(0) 
 $ knitr_in  : chr(0) 

> drake:::drake_deps(f)
drake_deps
 $ globals   : chr "y"
 $ namespaced: chr(0) 
 $ strings   : chr(0) 
 $ loadd     : chr(0) 
 $ readd     : chr(0) 
 $ file_in   : chr(0) 
 $ file_out  : chr(0) 
 $ knitr_in  : chr(0)
wlandau commented 3 years ago

Good question. The key piece is here:

https://github.com/ropensci/drake/blob/0ea2998ce8d2bbb822cf5a78d2466e67e18da997/R/store_outputs.R#L159-L165

standardize_imported_function() takes the function and returns a text representation of the body and signature. Then, the dependency hash meta$dependency_hash gets tagged on, and the whole thing gets hashed again when it is stored with the $set() method of config$cache (an RDS storr). So the overall hash depends on meta$dependency_hash, which in turn depends on the overall hashes of the upstream functions and global objects. That's why a change to an upstream function triggers a chain reaction to invalidates the downstream functions and targets.

kdkavanagh commented 3 years ago

Thanks Will!