jon-xu / scSplit

Genotype-free demultiplexing of pooled single-cell RNA-Seq, using a hidden state model for identifying genetically distinct samples within a mixed population.
MIT License
39 stars 9 forks source link

Sample genotypes #4

Closed jon-xu closed 5 years ago

jon-xu commented 5 years ago

Thanks Jon, the new script for main.py that you updated works well.

In addition, I would like to match the samples across the batches using SNP genotype data. Would there be a way for me to have the code report the SNP genotypes for each cluster across more than just the distinguishing variants? In other words, could I have the script output a matrix of all the SNPs with the cluster SNP genotypes?

Originally posted by @drneavin in https://github.com/jon-xu/scSplit/issues/3#issuecomment-507103556

jon-xu commented 5 years ago

We have added a function in the newest release to generate the full P/A matrix for all SNVs in our main script. A similar matrix of sample genotypes need to be generated to compare between the generated clusters and the original samples, To do that, please set the alternative presence flag when genotype probability (GP) was larger than 0.9 for RA or AA, or absence flag when GP was larger than 0.9 for RR.

Also, in the scSplit package, there's another script "genotype.py" which can be used to generate genotype VCF for the samples based on the result of main script.