statgen / popscle

A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet / Freemuxlet methods and auxilary tools
https://github.com/statgen/popscle/wiki
Apache License 2.0
43 stars 15 forks source link

Too many doublets but not many AMB calls when running demuxlet #69

Open RoseYuan opened 11 months ago

RoseYuan commented 11 months ago

I'm trying to run demuxlet with 8 individuals, and I always get a lot of doublet calls, but strangely not many AMB calls. I saw the discussions in previous issues here and here, and I tried the following parameter settings, but couldn't solve my problem.

I was getting 470/4927/4603 AMB/DBL/SNG calls in the first setting, and 435/6004/3561 AMB/DBL/SNG in the second setting.

I see that in my .best file, N.SNP and RD.UNIQ is smaller than the example file you gave here. Not sure if this could cause any problem or not. See the histogram below:

image image

If the small N.SNP and RD.UNIQ is the problem, how can I fix it? The vcf file I'm using is from SNPs array data, and the RD.TOTL seems fine (see below). So it's not the problem of SNPs. Is it because of any read filtering steps? Do you have any suggestions?

image
hyunminkang commented 10 months ago

It looks that the number of informative SNPs are probably too low for most droplets?

RoseYuan commented 10 months ago

Do you think it could be due to the read filtering steps? Because I feel the RD.TOTL is comparable. Is there a way to know if the SNPs give enough power or not, and how good the estimation is?

image
hyunminkang commented 10 months ago

Usually 20-30 SNPs per cells may be good enough, but it >100 are probably ideal. Check whether you have sufficient number of exonic SNPs in your VCF.