sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

Does zUMIs remove PCR duplicates from the bam files? #342

Closed pschinke closed 1 year ago

pschinke commented 1 year ago

Hello,

I'm using zUMIs on my SMART-seq3 data in order to obtain bam files which don't contain PCR duplicates for further analysis steps. My question is just if the ouput demultiplexed bam files from zUMIs pipeline have undergone duplicate removal, this was not clear to me from reading the paper and the documentation.

Thanks in advance!

cziegenhain commented 1 year ago

Hi,

No duplicates are being removed from the bam file in zUMIs.

Best, Christoph

pschinke commented 1 year ago

Hi Christoph,

thanks for the quick response! Can you maybe recommend a tool which works well with zUMIs and can be used for that purpose? Or is there an option in zUMIs to at least mark PCR duplicates similar to what cell ranger does?

cziegenhain commented 1 year ago

For quantification of gene expression, we have previously argued that duplicate removal of reads based on mapping coordinates is not recommended https://www.nature.com/articles/srep25533

Other than that, zUMIs-generated bam files should be compatible with any tool of your preference, so you could check for example duplicate marking in eg. picard-tools

Best, Christoph

pschinke commented 1 year ago

Makes sense. Unfortunately, UMI-based (!) duplicate removal is a vital step for my analysis. Does zUMIs provide any information about the UMIs in the bam files, so I could plug them a tool like UMI-tools? (Sorry for the late reply, I've been on vacation).

cziegenhain commented 1 year ago

Oh absolutely, the UMI is present in the bam file for every read so you can use that!

Here is the legend of the utilized tags: https://github.com/sdparekh/zUMIs/wiki/Output#explanation-of-the-bam-tags-zumis-uses

pschinke commented 1 year ago

Oh, I missed that somehow. Many thanks!

cziegenhain commented 1 year ago

No worries let me know if anything else is unclear. 11 jan. 2023 kl. 18:36 skrev Patrick Schinke @.***>: Oh, I missed that somehow. Many thanks!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: @.***>