ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

How can I reproduce a drake plan without `.drake`? #1353

Closed shirdekel closed 3 years ago

shirdekel commented 3 years ago

Prework

Question

I cloned the repo that contains my drake plan, but various targets run even though it was updated in its normal location. I have .drake in .gitignore, as per your suggestion. Is this why it re-runs? I assume that this is not the case and that I left out a relevant file somewhere. If so, can you suggest ways of debugging this?

Reproducible example

The code is in a private repo, and I'm not sure how to create a reprex for this question.

shirdekel commented 3 years ago

Another relevant but separate issue is that I get the following message for parts of my code in which I call loadd() on a target (e.g., target_y)

diagnose(target_x)$error$message:
  object 'target_y' not found

I assume that .drake is needed for those, so does this mean that I shouldn't use loadd within the plan?

wlandau commented 3 years ago

I cloned the repo that contains my drake plan, but various targets run even though it was updated in its normal location.

As you suspected, that is most likely because the .gitignore prevents drake output from being uploaded to GitHub. Output lives in the .drake/ cache, so if the cache is empty, your targets will not be up to date. targets has a lighter storage system and allows the _targets/ data store for small projects to be committed to GitHub without as many issues.

does this mean that I shouldn't use loadd within the plan?

For most cases, yes, loadd() is not meant for use inside the plan. memory_strategy = "none" or "unload" are probably the only exceptions: https://books.ropensci.org/drake/memory.html.