single-cell-genetics / vireo

Demultiplexing pooled scRNA-seq data with or without genotype reference
https://vireoSNP.readthedocs.io
Apache License 2.0
73 stars 27 forks source link

Cells unassigned by using snp array #66

Open yilevine opened 2 years ago

yilevine commented 2 years ago

Hi,

I was trying to demultiplex 20k cells to 4 donors. But only a few cells were assigned to each donor. each donor was genotyped using Infinium Omni2.5Exome-8 v1.5.

The vcf file looks like this: vcf

I used cellsnp-lite (v1.2.2) and vireoSNP(0.5.7) to get the results:

cellsnp-lite code

cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $DONOR_VCF --minMAF 0.02 --minCOUNT 20 --gzip

vireo code

vireo -c $CELL_DATA -d $DONOR_VCF -o $OUT_DIR -t GT -N $n_donor -M 200 --forcelearnGT

donor_ids

vireo-2

I noticed that only 2462 SNPs were used to demultiplex these cells. Is that enough?

And in the donor_ids, most of the prob_max were pretty low. I wanted to change this parameter. Could you explain how to set it?

Moreover, I tried to use common variants genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf as DONOR_VCF in cellnp-lit. And I got 90% assignment rate for each donor finally. So I was wondering if the SNP array (2.5M) I used is enough to demultiplex these cells?

Or do you have other suggestions on troubleshooting this issue? Thanks very much.

Yile

huangyh09 commented 2 years ago

Hi Yile,

Thanks for sharing your experience. I agree with your inspection that the major reason is the insufficient number of SNPs probed by arrays. Usually, I would perform a genotype imputation first for array or WES based approaches, e.g., by Sanger imputation server, then it may give around 5~10 times more SNPs.

Alternatively, you may keep using your trial with common variants for a reference-free deconlovltuon. Then you can use the non-imputed genotype to match the demultiplexed donors, e.g., with this tutorial.

Yuanhua

yilevine commented 2 years ago

Hi Yuanhua,

Thanks very much. I will try the solutions you provided.

And I also was wondering if you could share a protocol or pipeline on how to process array data. I am very new to microarray data. So I am not sure if the pipeline I am using is correct.

Thanks.

Yile

connersk commented 1 year ago

Thanks for developing this helpful tool! I had a very similar issue with donor genotypes from a SNP array, but found that the workaround using reference-free deconvolution and your donor matching notebook worked! Before looking at the issues on GitHub, I didn't see a link to this notebook anywhere on your documentation, nor did I see any documentation suggesting that SNP array data wouldn't work for donor genotyping. Might help future users quite a bit if you added a bit of detail on best practices for using SNP array genotypes at https://vireosnp.readthedocs.io/en/latest/

yuantiaotiao commented 3 months ago

I was trying to demultiplex 20k cells to 4 donors. But only a few cells were assigned to each donor. each donor was genotyped using Infinium Omni2.5Exome-8 v1.5.

Hello, I also encountered this problem. How did you solve it in the end? thank you. My vcf file comes from ASA SNP array, and each sample is also assigned to a small number of cells.