Closed trandles-lanl closed 3 years ago
Hmmm, I'm going to have to think about that. You're basically modifying the workflow with metadata and not the CWL description. If we assume that the saved database represents the repeatable workflow, we'd then need to save some kind of package of database files (e.g. including the ones that are subsequently started at bee_exit
).
Good point about bee_exit
. Maybe that's not such a good idea. I could imagine wanting to use bee_exit
to do things like record the exit state of the workflow. If that's less than success (i.e. some task failed), then we could record the failed task_id
and its exit code and error message. That would make identifying the culprit a lot easier in a complex workflow. It might also make restarting a failed workflow easier. Perhaps the error was caused by a missing Charliecloud container image. The user could put the image in place and say "restart from failed task."
Using bee_init
for an alarm or timer function is probably fair game because I think it's outside of the scope of CWL itself. It's not changing the workflow in any way. It's only controlling a condition under which the workflow can begin. The same functionality could be implemented for tasks in a workflow running on a slurm cluster using the --begin
switch for sbatch.
The fault-tolerance capability should be highlighted as one of the major features of BEE. Our orchestration has a global view of state of a workflow (init, waiting, running, et al). With proper configuration of timeout, BEE can decide to kill the waiting/no-responding/timeout tasks and go back to database to restart the task.
The state of the workflow is always captured by the current state of the database (live or archived on disk). I think all of what you want to do with bee_exit
can be done with the database.
Largely outdated by current state of GDB. Recommend closing and moving discussion to complex workflow upgrade.
Agree on closing this. @Boogie3D or @mcpherson feel free to close once you've captured anything you want to preserve in the new complex workflow plan.
I have some thoughts on how we might try to leverage the
bee_init
andbee_exit
nodes in the graph.bee_init
Contains metadata controlling the start of workflow execution. For instance, a user can set an "alarm" or a "timer." An "alarm" would indicate when a workflow should start running in the future. A "timer" would delay the start of a workflow.
bee_exit
Contains metadata controlling post-execution actions for a workflow. For instance, a user can say "start workflow after this workflow finishes."