Synthetic Bams -- High Disconcordance Rate

Hi, thank you for such a great tool!

I have a question about some strange results I am getting when looking at concordance estimates between the inferred and true donor for a cell barcode while using the synthetic bams simulation script. I am trying to look at what my concordance estimates are as we change the number of samples we pool together. I've run cellSNP with a list of candidate SNPs and then have run vireo providing it the VCFs to make it run genotype aware.

When I do this, I notice that my concordance rates already from 2-sample pool to 3-sample pool drop from 90% to 70%. I have started to investigate this, and noticed that a large reason for this is the proportion of cells w low n_vars (<10) drastically seems to be increasing.

Is there a reason this might be happening? Do you have any suggestions on what else could be done to improve results?

single-cell-genetics / vireo

Synthetic Bams -- High Disconcordance Rate #109