Open PedalheadPHX opened 5 years ago
Would something as simple as piping the bam through a view command with filters setup to drop the reads we don’t want to count?
On Sat, Mar 23, 2019 at 12:09 PM Jonathan Keats notifications@github.com wrote:
https://github.com/tgen/phoenix/blob/25049309063e72949bed70ad8b9f8e6b41681832/modules/bam_qc.jst#L23
It would be nice if we could get the exact count of duplicate reads as a metric. It seems like samtools always reports the TOTAL read count in the BAM not the UNIQUE read count, so multi-mappers and supplementary reads get counted multiple time.
Notes for NMTRC_0002_2_AG_Whole_T1_KHSTX: Source Reads Notes Input FASTQ reads 512975558 expected total read count Picard CollectAlignmentSummaryMetrics xxxx PAIR - TOTAL_READS 512975558 matches expectation Samtools flagstat xxxx in total 514772770 + 0 counts all reads in bam not just unique secondary 0 + 0 supplementary 1797212 + 0 duplicates 104680859 + 0 mapped 513884652 + 0 Samtools stats xxxx raw total sequences 512975558 matches expectation filtered sequences 0 sequences 512975558 reads mapped 512087440 reads duplicated 104680859 Samtools stats --remove-dups raw total sequences 513288289 inconsistent does not match flagstat or stats when --remove-dups is not enabled filtered sequences 104680859 same as duplicate reads in flagstat and stats when --remove-dups is not enabled sequences 408607430 reads mapped 407945229 reads duplicated 0 Samtools markdup xxxx READ 514772770 matches flagstat in total count EXCLUDED 2685330 EXAMINED 512087440 PAIRED 511797328 SINGLE 290112 DULPICATE PAIR 103916294 DUPLICATE SINGLE 225917 DUPLICATE TOTAL 104142211 doesn't match others, unique?
What is the true number of mapped reads flagged as duplicates?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tgen/phoenix/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/ANsPiAYUT4lcKBrqB3QAWTkDzRj1b-91ks5vZnvegaJpZM4cFF5i .
--
This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.
@awchrist Lets resolve this one as you clean up the v0.6 work
Can't turn off Picard tools till this is addressed and resolved
https://github.com/tgen/phoenix/blob/25049309063e72949bed70ad8b9f8e6b41681832/modules/bam_qc.jst#L23
It would be nice if we could get the exact count of duplicate reads as a metric. It seems like samtools always reports the TOTAL read count in the BAM not the UNIQUE read count, so multi-mappers and supplementary reads get counted multiple time.
Notes for NMTRC_0002_2_AG_Whole_T1_KHSTX:
xxxx
xxxx
xxxx
--remove-dups
xxxx
What is the true number of mapped reads flagged as duplicates?
IDXstats Sum = 514772770 . # Matches flagstat
awk '{print $3+$4}' NMTRC_0002_2_AG_Whole_T1_KHSTX.bwa.bam.idxstats.txt | awk '{sum+=$1} END {print sum}'
samtools view -@ 4 -f 0x400 -F 0x80 NMTRC_0002_2_AG_Whole_T1_KHSTX.bwa.bam | cut -f1 | sort | uniq | wc -l
52184064