single-cell-genetics / vireo

Demultiplexing pooled scRNA-seq data with or without genotype reference
https://vireoSNP.readthedocs.io
Apache License 2.0
71 stars 25 forks source link

high unassigned in 10x single nucleus RNA seq #24

Open evolanna opened 3 years ago

evolanna commented 3 years ago

Hi Huang, Thanks for reading my issue. I am trying to apply cellSNP and vireo on 10x single nucleus RNA seq. However, the unassigned nuclei are quite high.

Var1 Freq
donor0 417
donor1 463
donor2 168
donor3 605
donor4 434
donor5 721
donor6 549
doublet 444
unassigned 7058

The cellSNP run: cellSNP -s indexpossorted_genome_bam.bam -b barcodes.txt -O snpOUT_DIR -R genome1K.phase3.SNP_AF5e2.chr1toX.hg19.vcf.gz -p 20 --minMAF 0.1 --minCOUNT 20

The vireo run: vireo -c snpOUT_DIR -N 7 -o virosnp

Would you mind having a look and giving me some suggestion on how to improve the performance of cellSNP?

I hope I have provided enough information for you to help me with this issue. Thank you very much again.

Best regards,

Tongtong

huangyh09 commented 3 years ago

Hi, thanks for sharing this issue.

What is the median or mean reads per cell in this experiment? Maybe you could also check the donor_ids.tsv file and see if it is caused by too few SNPs detected in these unassigned cells or the relatively low assignable probability prob_max. If it is the latter, you could check the cumulative distribution function (CDF) of the prob_max and set a more lenient cutoff for assignable cells. By default, it requires prob_max > 0.9 to be assignable.

Alternatively, you could try the de novo mode by calling SNPs from the scRNA-seq directly with cellsnp-lite (mode 2b then mode 1a): https://cellsnp-lite.readthedocs.io/en/latest/manual.html

We are about to test these two strategies on low covered experiment and see they indeed help, but you can try it on your data too.

Yuanhua