sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

zUMIs "inex" all read counts and downsampled read counts the same #311

Closed seifudd closed 2 years ago

seifudd commented 2 years ago

Hi, Thanks for developing zUMIs.

I am looking at the read counts for my barcodes/samples (*.dgecounts.rds). I noticed that the total intron-exon read counts for each barcode/sample is the same for all read counts and downsampled read counts?

Shouldn't the read counts be different?

The parameter for downsampling in the counting_opts: section of the yml file is set to downsampling: 0, which we understand is adaptive downsampling. Regardless, shouldn't all read counts be more than the downsampled read counts?

In addition, in *.command_line_output_zummis.txt file we noticed that lots of reads are "assigned to barcodes that do not correspond to intact cells (below)." Do these reads get filtered out completely? If yes, which parameter turns this off in zUMIs?

Filtering... Fri Mar 4 17:06:28 UTC 2022 [1] "83573273 reads were assigned to barcodes that do not correspond to intact cells." [1] "Found 311 daughter barcodes that can be binned into 18 parent barcodes." [1] "Binned barcodes correspond to 38478236 reads."

On another note, we have a total of 352800215 reads sequenced in my FASTQ file. However, if I sum the number of reads from our *.BCstats.txt we only get 110422891 reads assigned to barcodes. This only accounts for 31.30% of the total reads. Are the rest of the reads being thrown away? Is there a parameter that can be adjusted to include the rest of the reads? It seems awkward to throw away ~70% of the reads.

Any help would be greatly appreciated.

Thanks, Fayaz

sdparekh commented 2 years ago

Please post the full verbose log, the yaml file and descriptive plots generated by zUMIs.

seifudd commented 2 years ago

SRG_RNA_Seq_Lite.zUMIs_config_formated.yaml.txt SRG_RNA_Seq_Lite.filtered.tagged.Log.final.out.txt SRG_RNA_Seq_Lite.command_line_output_zummis.txt SRG_RNA_Seq_Lite.geneUMIcounts.pdf SRG_RNA_Seq_Lite.readspercell.pdf SRG_RNA_Seq_Lite.features.pdf SRG_RNA_Seq_Lite.downsampling_thresholds.pdf

cziegenhain commented 2 years ago

Hi,

Best, Christoph

seifudd commented 2 years ago

Hi Christoph,

Thank you for your response. It was the last bullet point that clarified our concerns.

Thanks again, Fayaz