Closed jakirkham closed 9 years ago
Preliminary work has been done here ( https://github.com/jakirkham/nanshe/tree/ruffus ). It is unclear whether we will stick with this solution or pursue an alternative workflow framework. The repeated changing of filenames makes life somewhat difficult and pollutes the directory in use. However, working around that has resulted in workflows having state (module level variables), which is also not desirable.
As mentioned tried, but have since abandoned.
Ruffus ( https://github.com/bunbun/ruffus ) provides support for building and running computational pipelines with multiple stages. It supports multiprocessing and DRMAA. They do use
multiprocessing.Pool
, which is a bit of a problem for us when using SPAMS; so, we will have to see how best to handle that. However, if we can get it working, it will offload the direct pipeline management and job spawning to a library explicitly designed to do that. Also, it will make it easier to break up steps into multiple split and join operations for easier inspection of results and more accurate results. Finally, it will make documenting the pipeline (or pipelines) easier as it provides support for generating diagrams of the pipeline workflow.