jajclement / hotSSDS

2 stars 1 forks source link

MarkDuplicatesWithMateCigar does not work without MC tags #14

Open andpet0101 opened 1 year ago

andpet0101 commented 1 year ago

This is a serious bug in the pipeline:

When using MarkDuplicatesWithMateCigar, the input BAM files need to have the MC (mate cigar) tag. Otherwise, MarkDuplicatesWithMateCigar will simply include all reads without MC tag in the output (see https://gatk.broadinstitute.org/hc/en-us/articles/360037055692-MarkDuplicatesWithMateCigar-Picard-#--SKIP_PAIRS_WITH_NO_MATE_CIGAR) which means that duplicates are not filtered at all.

I would suggest to add the following lines:

# FIX: Add mate cigar information
    picard FixMateInformation I=${tmpNameStem}.unsorted.tmpbam O=${tmpNameStem}.unsorted.mc.tmpbam \
            VALIDATION_STRINGENCY=LENIENT >& ${tmpNameStem}.unsorted.mc.picardFM.out 2>&1

to process 5 after SamFormatConverter and before SortSam.