question on UMI deduplication step

mritchielab / FLAMES

A framework for performing single-cell and bulk read full-length analysis of mutations and splicing.

https://mritchielab.github.io/FLAMES/

GNU General Public License v3.0

17 stars 9 forks source link

question on UMI deduplication step #46

Open sparthib opened 2 days ago

sparthib commented 2 days ago

Hi there,

In my output I see, matched_reads_dedup.fastq and align2genome.bam . My question is, does UMI based deduplication occur at the fastq level, and is this dedup.fastq file then used for alignment? I just want to make sure that my BAM file doesn't have duplicate UMIs.

Thanks, Sowmya

ChangqingW commented 1 day ago

Our current UMI dedup is rather simplistic, it will just keep the longest read with the same UMI, rather than doing any consensus calling. The deduped one is used when realigning to the transcriptome (but not the initial align2genome.bam), you can double check this with samtools view -H align2genome.bam and the last few lines should tell you which command was used to produce the BAM file.

sparthib commented 1 day ago

Thanks, so if I understand correctly, the genome alignment still contains the duplicate reads but the transcriptome alignment doesn't?

ChangqingW commented 23 hours ago

Yes, that's correct