How can I reproduce a drake plan without `.drake`?

ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

https://docs.ropensci.org/drake

GNU General Public License v3.0

1.34k stars 128 forks source link

How can I reproduce a drake plan without `.drake`? #1353

Closed shirdekel closed 3 years ago

shirdekel commented 3 years ago

Prework

[x] Read and agree to the code of conduct and contributing guidelines.
[x] If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
[ ] For any problems you identify, write a minimal reproducible example so the maintainer can troubleshoot. A reproducible example is:
- [ ] Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- [ ] Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- [ ] Readable: format your code according to the tidyverse style guide.

Question

I cloned the repo that contains my drake plan, but various targets run even though it was updated in its normal location. I have .drake in .gitignore, as per your suggestion. Is this why it re-runs? I assume that this is not the case and that I left out a relevant file somewhere. If so, can you suggest ways of debugging this?

Reproducible example

The code is in a private repo, and I'm not sure how to create a reprex for this question.

shirdekel commented 3 years ago

Another relevant but separate issue is that I get the following message for parts of my code in which I call loadd() on a target (e.g., target_y)

diagnose(target_x)$error$message:
  object 'target_y' not found

I assume that .drake is needed for those, so does this mean that I shouldn't use loadd within the plan?

wlandau commented 3 years ago

I cloned the repo that contains my drake plan, but various targets run even though it was updated in its normal location.

As you suspected, that is most likely because the .gitignore prevents drake output from being uploaded to GitHub. Output lives in the .drake/ cache, so if the cache is empty, your targets will not be up to date. targets has a lighter storage system and allows the _targets/ data store for small projects to be committed to GitHub without as many issues.

does this mean that I shouldn't use loadd within the plan?

For most cases, yes, loadd() is not meant for use inside the plan. memory_strategy = "none" or "unload" are probably the only exceptions: https://books.ropensci.org/drake/memory.html.