Closed DRJP closed 3 years ago
For multi-contributor workflows, there is unfortunately not a good way to do this in drake
unless you both use the same file system on the same physical machine. drake
caches are too heavy to be portable, and there is nothing I can do about it without breaking the whole package. That is one of the primary reasons why I created targets
, the long-term successor to drake
. In targets
, the data store is much lighter, more portable, more resistant to accumulating garbage, and more resilient when files are corrupted. All of this makes it easier to ship the _targets/
data folder to GitHub (for small projects) or OSF/OneDrive/Box/Dropbox/Google Drive (for large projects). Even better, you can actually store everything in one or more AWS S3 buckets. Details on cloud storage are at https://wlandau.github.io/targets-manual/cloud.html. None of this will ever be possible in drake
due to permanent design limitations.
So for your case, I definitely recommend:
targets
.Related: I have an example targets
workflow to validate a small Bayesian model:
That particular example does not use AWS S3, but it would be straightforward to add.
Hi
I am collaborating on a project with a colleague via drake & git. On my side of the project I must run some very heavy MCMCs (written with the R package "nimble" - which compiles R-like code to C++ for speed) which run for many days. I am currently not doing these inside drake, because I don't see how we can avoid my colleague having to rerun all my MCMCs. The situation seems paradoxical and we can't figure out a reasonable work flow.
Do you have any recommendations for how a team can work with drake in situations involving lengthy calculations / simulations?
Cheers David