single-cell-genetics / vireo

Demultiplexing pooled scRNA-seq data with or without genotype reference
https://vireoSNP.readthedocs.io
Apache License 2.0
73 stars 27 forks source link

Synthetic Bams -- High Disconcordance Rate #109

Open marcus-tutert opened 1 month ago

marcus-tutert commented 1 month ago

Hi, thank you for such a great tool!

I have a question about some strange results I am getting when looking at concordance estimates between the inferred and true donor for a cell barcode while using the synthetic bams simulation script. I am trying to look at what my concordance estimates are as we change the number of samples we pool together. I've run cellSNP with a list of candidate SNPs and then have run vireo providing it the VCFs to make it run genotype aware.

When I do this, I notice that my concordance rates already from 2-sample pool to 3-sample pool drop from 90% to 70%. I have started to investigate this, and noticed that a large reason for this is the proportion of cells w low n_vars (<10) drastically seems to be increasing.

Is there a reason this might be happening? Do you have any suggestions on what else could be done to improve results?

Screenshot 2024-10-10 at 9 39 54 AM Screenshot 2024-10-10 at 9 40 31 AM