sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

Question: Is the donwsampling of UMI reads performed on all reads or just 5' reads ? #255

Closed vincenthahaut closed 3 years ago

vincenthahaut commented 3 years ago

Hi!

I run the latest version of zUMIs on SS3 data and generated a matrix of count object (dgecount.rds) containing several downsampling matrices.

sce <- read_rds("Smartseq3.dgecounts.rds")

While I assume that the following downsampling matrix has been generated using all the reads (5' & internal):

sce$readcount$inex$downsampling$downsampled_100000

Could you confirm that the umicount and the readcount_internal have been produced by downsampling 100K 5' reads and internal reads respectively ?

sce$umicount$inex$downsampling$downsampled_100000 sce$readcount_internal$inex$downsampling$downsampled_100000

and not by downsampling first 100K reads from all the reads and then re-extracting the UMI or internal reads ?

Thank you in advance!

cziegenhain commented 3 years ago

Hi,

Yes the downsampling is always done on all reads in that barcode. So for Smartseq3 data the number of UMIs within e.g. 100k reads would also depend on the fraction of 5' vs internal reads.

Best, Christoph

vincenthahaut commented 3 years ago

Thank you for the super fast clarification!

Have a nice day,

Vincent