Closed drneavin closed 4 years ago
How much higher? If it is super high there may be a problem with the clustering. If it is only moderately high (2-4x expected) then all I can suggest is using common variants or known variants if you have them. I have had a few users use souporcell successfully on cancer samples, but I don't think they did anything else. If amount of data is high (umi/cell) then maybe increasing --min_alt and --min_ref to 20ish might get rid of some false positives and somatics. It may require some in depth data analysis to figure it out.
Hi @wheaton5,
Yes, it is ~2x higher than expected for most of the samples but we are already using common variants. There is one sample where about 2/3 of the cells are being called doublets - this is an interesting pool because we anticipate a 17:3 ratio of cells between the two pooled samples so I would guess that this skewed pooling might be impacting the clustering?
In terms of data quality, most of the samples are on the lower end of reads and genes per cell (~25,000 reads and < 1,000 genes) but OK UMIs (~2,500 per cell) so I don't think that increasing the alt and ref would help. Let me know if you disagree with that assessment.
Thanks, Drew
Well, I would say that 2,500 UMI/cell is pretty low. With 10x genomics v3 chemistry, you should generally be getting 4k+. So I am guessing data quality / data amount is the main issue.
Souporcell is surprisingly robust to skewed cluster ratios. I've run 200:1 synthetic mixtures with 4000 cells in 4 clusters and 20 cells in the other cluster and it worked very well with all 20 minority cells assigned to a single cluster, tho a few doublets were also assigned to that cluster. But when you combine low data with skewed cluster ratios, perhaps that is a problem. From the results I would not trust that clustering. I think for the others you are fine dropping the doublets and analyzing the remaining cells. I don't really have good suggestions on how to recover those false doublets.
Sorry.
This is really helpful @wheaton5, I was thinking along the same lines but wanted to see if I had missed anything or if you had any experience with cancer samples in particular. Cheers for the recommendations and the follow up!
Hi @wheaton5,
I am working with some multiplexed cancer samples and have noticed higher number of doublets than anticipated when using souporcell to demultiplex with the default options. I am wondering what settings you would recommend to change when demultiplexing cancer samples with souporcell?
Thanks, Drew