Open laderast opened 6 years ago
Drake can do a lot of what you are asking for (cc @wlandau-lilly). vis_drake_graph
on your Drake plan should show you steps that failed (in red). Fixing those should only re-run those steps and not everything from scratch. You might take a look at https://github.com/ropensci/drake and see if it's something that matches your needs.
+1 to that! Suppose we're working with this data analysis workflow and one of our functions does not work.
create_plot <- function(data) {
ggplot(data, aes(x = Petal.Width, fill = Species)) +
geom_histogram(binwidth = 0.25) +
theme_gray(20) +
BAD_LAYER
}
We run our workflow and see.
> make(plan)
target raw_data
target data
target fit
target hist
fail hist
Error: Target `hist`` failed. Call `diagnose(hist)` for details. Error message:
object 'BAD_LAYER' not found
Diagnostics include warnings, errors, messages, and other context.
> diagnose(hist)
> diagnose(hist)
$target
[1] "hist"
$messages
NULL
$error
<simpleError in create_plot(data): object 'BAD_LAYER' not found>
You can list the failed targets programatically.
> failed()
[1] "hist"
As Karthik mentioned, these failures are shown in the dependency graph.
config <- drake_config(plan)
vis_drake_graph(config, targets_only = TRUE, full_legend = FALSE)
Ah, very cool. I didn't know about Drake!
@laderast, would purrr::safely()
and purrr::possibly()
help you?
Thanks @maurolepore - purrr::safely()
is a nice approach
Hi Everyone,
I've been kicking this idea around for a little bit. Our group does a lot of batch processing of input files when we run our pipeline for flow cytometry data. Sometimes the output of a step will fail, and we have to flag the files that fail so they aren't passed through further steps in the pipeline.
When I do this currently, I basically build file manifests (location of files with relevant metadata) and run some sort of processing in R. I was thinking maybe by incorporating data assertions (like with assertr), we can have a workflow that shows when files pass a step, and flags those files that fail a processing step. In the end, we can display to users of the pipeline which files passed and which files didn't, and which steps.
Maybe there's a little germ of an idea here that might work for the unconf. I'm not sure, so I'm putting it out there.