ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

Why outdated when results of deps profile are all false #1363

Closed jennysjaarda closed 3 years ago

jennysjaarda commented 3 years ago

Question

Unfortunately I do not have a reproducible example, but I am just wondering if you have any thoughts on why a target would be outdated but the results of deps_profile are all false?

image

This is long running code, so it's quite time-consuming to have it run again, especially when I have no idea why it became outdated.

wlandau commented 3 years ago

I suspect baseline_gwas_figures_input is actually up to date with its immediate dependencies but that something farther upstream is not. If you keep searching upstream in the dependency graph, I bet you will find something that shows a change in deps_profile(). It's likely to be one of the left-most outdated nodes you see in vis_drake_graph(). outdated() not only returns immediately outdated targets, but also their reverse dependencies all the way downstream.

To prevent these kinds of surprises, I recommend writing a _drake.R file and using functions like r_make() and r_outdated() as described at https://books.ropensci.org/drake/projects.html#safer-interactivity. This is kind of what targets does by default. and targets can tell you which functions/globals are outdated, while drake cannot: https://books.ropensci.org/targets/drake.html#show-status-of-functions-and-global-objects.

jennysjaarda commented 3 years ago

Thanks for the advice. I just wanted to quickly followup because I continue to run into these problems (I may just migrate over to targets for this reason). But I have a plan where only two targets are outdated (see below) and those two targets show nothing changed by deps_profile, so I really don't know why they are outdated?

image image

jennysjaarda commented 3 years ago

I did a little bit more digging and I may have found the source of the problem, but I still have not found a good solution. The left-most outdated target (bgenie_out) is a dynamic target with a file output. There are ~3000 subtargets and ~70 don't produce a file, so a warning is thrown. The next target (bgenie_unzip), as the name suggests, unzips the files from the previous target, and also has a file output, so a warning is similarly thrown because no unzipped file is created. When I make the plan, only these 70 targets are run. But the warnings prevent the plan from ever completing it seems. Is there any way to modify my plan without rerunning all the other nearly 3000 targets which take about 5 days to run?

image

wlandau commented 3 years ago

This will require some modification to your code to make sure every dynamic sub-target produces a file, which would ordinarily invalidate all the sub-targets. But one workaround you could try is to use cancel_if() to cancel the sub-target if the file already exists. Could be trouble if there are good reasons to update existing files, but it may save you 5 days of runtime in the short term.

llalluan commented 1 year ago

hi! where did you get the install pack of BGENIE?I clicked the linkage https://www.dropbox.com/home/Oxford/software/software_webpages/bgenie but showed that the pack is not exist. If you kown something about that, please let me know, Thanks!!!