At the moment Beagle will import fastq files through the etl module and then go to that fastq files' assigned operator. We limited it to one operator per recipe, but it is time we consider removing that restriction. If we do so, we can run multiple operators from the same initial starting point.
The biggest benefit is then we could run multiple, smaller independent pipelines, such as something that checks if the bams are "viable" and should be runnable through variant calling (such as https://github.com/mskcc/roslin-cwl/issues/35), in addition to the main, larger pipeline, giving us results that can be used to debug bam errors without waiting for the main pipeline to finish.
In the short term, we would be introducing some level of redundancy with data outputs - for example, if we were to implement a new module that does health checks of the bam and fastq in addition to the existing roslin-cwl pipeline, we end up running bam alignment twice - but it would be the fastest route forward, and we could put some checks in place to mitigate the digital footprint.
Other than this concern, the question then becomes: what other ways could this adversely affect current functionality?
At the moment Beagle will import fastq files through the etl module and then go to that fastq files' assigned operator. We limited it to one operator per recipe, but it is time we consider removing that restriction. If we do so, we can run multiple operators from the same initial starting point.
The biggest benefit is then we could run multiple, smaller independent pipelines, such as something that checks if the bams are "viable" and should be runnable through variant calling (such as https://github.com/mskcc/roslin-cwl/issues/35), in addition to the main, larger pipeline, giving us results that can be used to debug bam errors without waiting for the main pipeline to finish.
In the short term, we would be introducing some level of redundancy with data outputs - for example, if we were to implement a new module that does health checks of the bam and fastq in addition to the existing
roslin-cwl
pipeline, we end up running bam alignment twice - but it would be the fastest route forward, and we could put some checks in place to mitigate the digital footprint.Other than this concern, the question then becomes: what other ways could this adversely affect current functionality?