uptake / updraft

R package for building flexible workflows
Other
13 stars 12 forks source link

Clean Up Interface to Build a Workflow Graph #20

Open cwschultz88 opened 6 years ago

cwschultz88 commented 6 years ago

The process of building connection and module objects can be a pain. Want to rework the object interfaces to simplify things as much as possible to build and execute workflows.

This is a major project that needs to get done on the road to get to a 1.0 release.

jayqi commented 6 years ago

So one way the interface can be simplified is if a lot of the methods return invisible(self) and you can chain stuff.

So concretely, here's the setup from test_execution.R:

workflow1 <- DAGWorkflow$new(name="workflow1") # Dependency -- assumes working WorkflowDAG, etc.
module1_1 <- PackageFunctionModule$new(name = "module1_1", fun ="rnorm", package = "stats")
module2_1 <- PackageFunctionModule$new(name = "module2_1", fun ="rnorm", package = "stats") 
module3_1 <- PackageFunctionModule$new(name = "module3_1", fun ="rnorm", package = "stats") 
module4_1 <- CustomFunctionModule$new(name = "module4_1", fun = function(a,b,c){cat(a+b+c, file=file.path(workingDir, file = "workflow1_output.txt"))})
connection1_1 <- DirectedConnection$new(name = "connection1_1", headModule = module1_1, tailModule = module4_1, inputArgument = c('a'))
connection2_1 <- DirectedConnection$new(name = "connection2_1", headModule = module2_1, tailModule = module4_1, inputArgument = c('b'))
connection3_1 <- DirectedConnection$new(name = "connection3_1", headModule = module3_1, tailModule = module4_1, inputArgument = c('c'))
workflow1$addModules(list(module1_1
                          , module2_1
                          , module3_1
                          , module4_1))
workflow1$addConnections(list(connection1_1
                              , connection2_1
                              , connection3_1))

If we are able to chain methods then you'd be able to write it like this if you wanted. Of course you could still do something in the middle with some intermediate variables, but you definitely wouldn't need to keep writing workflow1 over and over.

workflow1 <- (DAGWorkflow$new(name="workflow1")
              $addModules(list(
                  PackageFunctionModule$new(name = "module1_1", fun ="rnorm", package = "stats"),
                  PackageFunctionModule$new(name = "module2_1", fun ="rnorm", package = "stats"),
                  PackageFunctionModule$new(name = "module3_1", fun ="rnorm", package = "stats"), 
                  CustomFunctionModule$new(name = "module4_1", fun = function(a,b,c){cat(a+b+c, file=file.path(workingDir, file = "workflow1_output.txt"))})
              ))
              $addConnections(list(
                  DirectedConnection$new(name = "connection1_1", headModule = "module1_1", tailModule = "module4_1", inputArgument = c('a')),
                  DirectedConnection$new(name = "connection2_1", headModule = "module2_1", tailModule = "module4_1", inputArgument = c('b')),
                  DirectedConnection$new(name = "connection3_1", headModule = "module3_1", tailModule = "module4_1", inputArgument = c('c'))
              ))
)
jameslamb commented 6 years ago

I am merely an observer here, but want to be on record as saying that I like this.