kids-first / kf-alignment-workflow

:microscope: Alignment workflow for Kids-First DRC
Apache License 2.0
10 stars 6 forks source link

BAM sorting optimization suggestion #50

Closed bogdang989 closed 6 years ago

bogdang989 commented 6 years ago

Tools in the workflow

Current function

Sort aligned, duplicate marked BAM.

Proposed modification

Use Sambamba Sort+Index for this purpose.

Performance improvement

Picard SortSam time 3h 1m Picard MarkDuplicates time 4h 50m Sambamba Merge+Sort+Index time 1h 12m

Example command lines

Sambamba Merge

/opt/sambamba_v0.6.4 merge  -t 31  ae3b4fcd963d404081393b9cf038d4d5.aligned.duplicates_marked.sorted.bam
/root_bwa_mem_1_s/2895813008.aligned.unsorted.bam 
/root_bwa_mem_2_s/2895813030.aligned.unsorted.bam 
/root_bwa_mem_3_s/2895813316.aligned.unsorted.bam 
....
...
... 
/root_bwa_mem_21_s/2895821901.aligned.unsorted.bam

Sambamba Sort

/opt/sambamba_v0.6.4 sort  -o ae3b4fcd963d404081393b9cf038d4d5.aligned.duplicates_marked.sorted.bam -t 31 /root_sambamba_merge/ae3b4fcd963d404081393b9cf038d4d5.aligned.duplicates_marked.sorted.bam

Sambamba Index

mv /root_sambamba_sort/ae3b4fcd963d404081393b9cf038d4d5.aligned.duplicates_marked.sorted.bam . && /opt/sambamba_v0.6.4 index -t 31 ae3b4fcd963d404081393b9cf038d4d5.aligned.duplicates_marked.sorted.bam ae3b4fcd963d404081393b9cf038d4d5.aligned.duplicates_marked.sorted.bai

Sambamba Merge is added in place of Picard MarkDuplicates, as merging of separate BAM files was originally done there. Tested with a randomly selected BAM from the pilot set.