Open sparthib opened 2 days ago
Our current UMI dedup is rather simplistic, it will just keep the longest read with the same UMI, rather than doing any consensus calling.
The deduped one is used when realigning to the transcriptome (but not the initial align2genome.bam
), you can double check this with samtools view -H align2genome.bam
and the last few lines should tell you which command was used to produce the BAM file.
Thanks, so if I understand correctly, the genome alignment still contains the duplicate reads but the transcriptome alignment doesn't?
Yes, that's correct
Hi there,
In my output I see,
matched_reads_dedup.fastq
andalign2genome.bam
. My question is, does UMI based deduplication occur at thefastq
level, and is thisdedup.fastq
file then used for alignment? I just want to make sure that my BAM file doesn't have duplicate UMIs.Thanks, Sowmya