Nextflow is a workflow manager with plenty of use in the scientific community and importantly, with a user base and support here at Fred Hutch.
Converting CFT from scons to nextflow would be a large task, but would allow the flexibility of easily switching from running the pipeline locally to running in the cloud / using a Docker container, using continuous integration tools to have automated tests, easy monitoring of running analyses etc (people in the Fred Hutch community are even working on a web interface for non-computationally-focused researchers to configure and launch nextflow workflows).
I am dumping my notes here about what would have to happen for this to get done, in case they are useful in the future:
Basic coversion of each scons build ojbect to a nextflow processes
All the scripts that SConstruct calls could remain more or less the same but the logic in SConstruct would all have to get converted into a main.nf file.
Input yaml file parsing in SConstruct (python) would need to become its own Nextflow process(es)
In Scons, we use nestly to handle the nesting of outputs, this could probably be handled with Nextflow channels
In Scons, we use tripl to summarize output in Olmsted parseable files.
Since we use tripl through nestly wrappers / decorators around scons python functions, we would probably have to redesign this in a different way to have it work in Nextflow
This would probably be the bulk of the work - figuring out how to summarize all the CFT output in a file that could be consumed by Olmsted.
One idea is to just stick with JSON or YAML and have one output file that summarizes all the inputs and outputs for each step of the pipeline
We would have to decide where we wanted to make tradeoffs between writing some simple (ableit less flexible) python code to parse CFT output from said JSON or YAML vs trying to use something like tripl.
This would involve asking questions along the lines of: what do we get from using tripl
(flexible data model, performance) and if it's not easy to use tripl with nextflow, how can we get these things in another way?
Nextflow is a workflow manager with plenty of use in the scientific community and importantly, with a user base and support here at Fred Hutch.
Converting CFT from scons to nextflow would be a large task, but would allow the flexibility of easily switching from running the pipeline locally to running in the cloud / using a Docker container, using continuous integration tools to have automated tests, easy monitoring of running analyses etc (people in the Fred Hutch community are even working on a web interface for non-computationally-focused researchers to configure and launch nextflow workflows).
I am dumping my notes here about what would have to happen for this to get done, in case they are useful in the future: