sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

Parsing out 5' UMI and internal reads in SmartSeq3 data #295

Closed pm-Genome2021 closed 2 years ago

pm-Genome2021 commented 2 years ago

As I understand the setup in zUMIs, the 5' UMI reads are identified by matching the pattern specified in the 'find_pattern' field and these reads are further processed in downstream steps (filtering, aligning etc). Is there a way to isolate the internal reads and to process them separately from the 5' UMI reads through the full zUMIs pipeline? Any help would be highly appreciated. Thanks!

cziegenhain commented 2 years ago

Hi,

I don't think I fully undersand the question. The two read types in Smart-seq3 are processed both appropriately throughout the pipeline. The resulting count tables also give you (among other things) internal read counts and deduplicated 5' UMI counts. In what way do you need to process them separately? Other than the built-in correct processing, zUMIs does not offer any user-facing functions to separate the reads.

Best, Christoph

pm-Genome2021 commented 2 years ago

Hi @cziegenhain Thanks for getting back to me. I am interested in getting a distribution of reads across genome (exonic, intronic, intergenic) separately for UMI reads and internal reads and was wondering there was some functionality in the pipeline that could do this that I missed. I am able to get this distribution for UMI reads and all reads (UMI + internal) by including/excluding the 11bp tag match. Based on your answer I assume, I may need to separate the reads on my own outside zUMIs.

cziegenhain commented 2 years ago

Right, I understand! I don't see any immediate way to do this in zUMIs unfortunately. I can write it down as a feature to add in the future, however can't promise that it can happen very soon.

Beware that if you give the tag match or not in the yaml, you'll always get internal+UMI reads regardless in the case of Smartseq3 data in zUMIs - they will just not get used correctly if you don't include the pattern.

Best, C.

pm-Genome2021 commented 2 years ago

Thanks @cziegenhain