jon-xu / scSplit

Genotype-free demultiplexing of pooled single-cell RNA-Seq, using a hidden state model for identifying genetically distinct samples within a mixed population.
MIT License
39 stars 9 forks source link

Reducing doublets using scSplit #25

Closed inbarsh2 closed 9 months ago

inbarsh2 commented 1 year ago

Hi,

We are using your scSplit algorithm to demultiplex our single nuclei RNA-seq samples and we will appreciate your help to improve our results . By using the algorithm with common SNVs we are getting about ~25% of doublets (for now we tried it only on a set of 2 multiplexed samples). We noticed that the algorithm is using only 2 SNPs to discriminate between the samples (as listed in the scSplit_dist_variants file). We were wondering if there is a way to increase the identification of the nuclei by expanding the number of variant SNP's or any other way.

Thanks a lot for your help!

jon-xu commented 1 year ago

Hi inbarsh2,

Thanks for your interest on our tool! We didn't test it on single nuclei samples, but it should still work.

The SNPs in scSplit_dist_variants.txt is not the ones we used to demultiplex samples, but the ones you can use to link the demultiplexed samples back to your samples.

The SNPs we used to demultiplex samples are those in the allele count matrices ("ref_filtered.csv" and "alt_filtered.csv") instead.

You can use -d option in scSplit run to control the proportion of doublets, e.g. -d 0.1 means you expect 10% doublets. To be more precise, you can use other dedicated doublet detection tools like DoubletFinder for example to remove doublets and run scSplit using -d 0 to demultiplex only singlets.

Hope that helps! Jon