Open yilevine opened 2 years ago
Hi Yile,
Thanks for sharing your experience. I agree with your inspection that the major reason is the insufficient number of SNPs probed by arrays. Usually, I would perform a genotype imputation first for array or WES based approaches, e.g., by Sanger imputation server, then it may give around 5~10 times more SNPs.
Alternatively, you may keep using your trial with common variants for a reference-free deconlovltuon. Then you can use the non-imputed genotype to match the demultiplexed donors, e.g., with this tutorial.
Yuanhua
Hi Yuanhua,
Thanks very much. I will try the solutions you provided.
And I also was wondering if you could share a protocol or pipeline on how to process array data. I am very new to microarray data. So I am not sure if the pipeline I am using is correct.
Thanks.
Yile
Thanks for developing this helpful tool! I had a very similar issue with donor genotypes from a SNP array, but found that the workaround using reference-free deconvolution and your donor matching notebook worked! Before looking at the issues on GitHub, I didn't see a link to this notebook anywhere on your documentation, nor did I see any documentation suggesting that SNP array data wouldn't work for donor genotyping. Might help future users quite a bit if you added a bit of detail on best practices for using SNP array genotypes at https://vireosnp.readthedocs.io/en/latest/
I was trying to demultiplex 20k cells to 4 donors. But only a few cells were assigned to each donor. each donor was genotyped using Infinium Omni2.5Exome-8 v1.5.
Hello, I also encountered this problem. How did you solve it in the end? thank you. My vcf file comes from ASA SNP array, and each sample is also assigned to a small number of cells.
Hi,
I was trying to demultiplex 20k cells to 4 donors. But only a few cells were assigned to each donor. each donor was genotyped using Infinium Omni2.5Exome-8 v1.5.
The vcf file looks like this:
I used cellsnp-lite (v1.2.2) and vireoSNP(0.5.7) to get the results:
cellsnp-lite code
cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $DONOR_VCF --minMAF 0.02 --minCOUNT 20 --gzip
vireo code
vireo -c $CELL_DATA -d $DONOR_VCF -o $OUT_DIR -t GT -N $n_donor -M 200 --forcelearnGT
donor_ids
I noticed that only 2462 SNPs were used to demultiplex these cells. Is that enough?
And in the
donor_ids
, most of the prob_max were pretty low. I wanted to change this parameter. Could you explain how to set it?Moreover, I tried to use common variants
genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf
as DONOR_VCF in cellnp-lit. And I got 90% assignment rate for each donor finally. So I was wondering if the SNP array (2.5M) I used is enough to demultiplex these cells?Or do you have other suggestions on troubleshooting this issue? Thanks very much.
Yile