Closed SPPearce closed 3 days ago
I haven't been able to replicate the bottleneck as I understand from your description.
For some additional context, each MarkDups task must receive all BAMs for a given sample before starting to process and merge into a single output BAM. So blocking in that sense on a per-sample basis is intended and required. However, there should not be blocking/bottlenecking where all alignments must complete before any MarkDups process begins.
I've run oncoanalyser in stub mode and added an artificial 60 second delay to one sample in the bwa-mem2 process to evaluate flow through the NF channels. As expected, all MarkDups tasks run as soon as each set of sample BAMs become available (see attached timeline and below expandable to replicate).
If you're seeing different behaviour, could you please provide some additional details of your observations and how you're running oncoanalyser?
Attachment: execution_timeline_2024-08-05_12-36-17.html.gz
Closing the issue but please re-open if you'd like to discuss further!
Closing the issue but please re-open if you'd like to discuss further!
Ah, completely forgot about this one, been busy with other bits ATM.
Description of feature
The pipeline currently seems to have a bottleneck at the alignment -> markdups step, where all the alignment has to be completed before any markdups processes will begin. The pipeline already uses
groupKey
to determine how many files should be expected from the splitting process, but this happens after the bwamem2 mapping step.