Low spliced+unspliced gene counts in velocyto loom compared to number of genes found in zUMIs

droplet-lab commented 4 years ago

Hi, thank you very much for developing zUMIs!

Describe the bug I have ~ 4,000 genes (introns+exons) in my zUMIs single-cell count matrices. However, while running velocyto (Velocyto: yes), I can see that the loom files have very few spliced/unspliced counts (~500 total).

I was wondering if this is something that you have observed as well? Would there be any way of getting around such low counts (as the Velocyto teams seems to obtain similar numbers to 10x CellRanger)

Thank you very much for your help!

To Reproduce

project: Proj sequence_files: file1: name: /path/proj_R1.fastq.gz base_definition: cDNA(1-61) file2: name: /path/proj_R2.fastq.gz base_definition: BC(1-8) file3: name: /path/proj_R3.fastq.gz base_definition: BC(1-8) file4: name: /path/proj_R4.fastq.gz base_definition:

BC(1-8)
UMI(9-14) reference: STAR_index: /path/STARfull GTF_file: /path/Mus_musculus.GRCm38.99.gtf additional_STAR_params: '' additional_files: ~ out_dir: /path/zumis/Out/Proj num_threads: 8 mem_limit: 0 filter_cutoffs: BC_filter: num_bases: 1 phred: 20 UMI_filter: num_bases: 1 phred: 20 barcodes: barcode_num: ~ barcode_file: ~ automatic: yes BarcodeBinning: 0 nReadsperCell: 100 counting_opts: introns: yes downsampling: '0' strand: 0 Ham_Dist: 0 velocyto: yes primaryHit: yes twoPass: yes make_stats: yes which_Stage: Filtering Rscript_exec: Rscript STAR_exec: STAR pigz_exec: pigz samtools_exec: samtools zUMIs_directory: /path/zumis/zUMIs read_layout: SE

cziegenhain commented 4 years ago

Hi,

Thanks for using zUMIs! As you know velocyto relies on counting reads explicitly sitting on splice junctions to determine if you are spliced/unspliced. In zUMIs, a gene will get counts assigned if they are fully in an an exon/intron - which is of course way more likely. I think overall it really depends on which scRNA-seq protocol you use and how deeply you sequence. What did you have here in this example?

Apart from this, I just double checked: There is one setting that does not get parsed from zUMIs (related to whether primary hits for multimappers are considered). I will make an update that will have velocyto also consider these if users have primaryHit: yes set. Maybe you loose a few reads vs the zUMIs quantification like this too.

Hope this helps.

Best, Christoph

cziegenhain commented 4 years ago

I pushed an update to use multimapped reads if requested by the user as described above.

droplet-lab commented 4 years ago

Hi Cristoph, thanks a lot for the reply, that would make sense indeed. I am in the process of removing multimapped reads and comparing again, I will let you know. I think indeed parsing some input to velocyto would be nice (although I have just started working on velocity and am not an expert).

All the best, Jo

cziegenhain commented 4 years ago

Hi Jo,

So the multimapping settings now get correctly input into velocyto. Otherwise you can always use the generated velocyto bam file and play around with some custom settings with that!

droplet-lab commented 4 years ago

Awesome thanks a lot, will this be in the new release ?

cziegenhain commented 4 years ago

If you type git pull in your zUMIs directory, it should fetch the newest version: v2.7.1c from a couple hours ago

droplet-lab commented 4 years ago

great thank you so much!

sdparekh / zUMIs

Low spliced+unspliced gene counts in velocyto loom compared to number of genes found in zUMIs #174