ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

Import cache from same plan on different computer #1354

Closed a1785487 closed 3 years ago

a1785487 commented 3 years ago

I have a drake plan which I use across several machines (PC, MAC + laptop PC) - I'm a master's candidate, working on a big data project, who has to use a Mac at Uni (I have no choice) but runs a high end windows desktop in my study at home. I use my laptop as a portable device when I have spare time at work (or where ever).

Part of the drake plan is quite memory intensive with respect to the laptop, and I can't run part of the plan on the machine as it runs out of physical memory and thrashes the paging file (and then crashes).

What I want to do is run the same plan on my higher end machines, and then get the data up to that point onto the laptop, so I can work on the remaining sections of the plan whilst not having out of date targets that will crash the plan/computer.

I'd like to keep the plan the same between machines, as managing two plans will introduce the version control issues....

IS this possible, and how do i do it, or do I need to split the plan in two and hope I keep everything up to date between the two plans????

note: I use github as my version control for the code (between all of the machines) and rely on the plan to keep the targets up to date if/when the input data changes, or the data processing is modified.

Thanks for your help, Trev

wlandau commented 3 years ago

Unfortunately, there is not a whole lot you can do in drake with that setup. One alternative is to request access to your university's computing cluster and SSH in from whichever machine you are currently using.

targets, the successor to drake, is actually a lot better at this. Its data store is a lot more portable, easier to ship over Git LFS or Dropbox. And if you can get student rates on Amazon Web Services, targets can automatically upload/download data to and from S3 buckets: https://wlandau.github.io/targets-manual/cloud.html (soon to be https://books.ropensci.org/targets/cloud.html). Switching from drake is not all that bad: https://wlandau.github.io/targets-manual/drake.html#transitioning-to-targets (soon to be https://books.ropensci.org/targets/drake.html#transitioning-to-targets).