Allow multiple operators to start after etl import

allanbolipata commented 4 years ago

At the moment Beagle will import fastq files through the etl module and then go to that fastq files' assigned operator. We limited it to one operator per recipe, but it is time we consider removing that restriction. If we do so, we can run multiple operators from the same initial starting point.

The biggest benefit is then we could run multiple, smaller independent pipelines, such as something that checks if the bams are "viable" and should be runnable through variant calling (such as https://github.com/mskcc/roslin-cwl/issues/35), in addition to the main, larger pipeline, giving us results that can be used to debug bam errors without waiting for the main pipeline to finish.

In the short term, we would be introducing some level of redundancy with data outputs - for example, if we were to implement a new module that does health checks of the bam and fastq in addition to the existing roslin-cwl pipeline, we end up running bam alignment twice - but it would be the fastest route forward, and we could put some checks in place to mitigate the digital footprint.

Other than this concern, the question then becomes: what other ways could this adversely affect current functionality?

aef- commented 4 years ago

This is something ACCESS will needs. Let me know if you need someone to work on this.

allanbolipata commented 4 years ago

@aef- For sure, let's discuss this next week so we can add it to the sprint.

allanbolipata commented 4 years ago

Done in #315

mskcc / beagle

Allow multiple operators to start after etl import #242