wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
159 stars 45 forks source link

Need help separating fetal from maternal cells in placenta #240

Open bkheira opened 1 month ago

bkheira commented 1 month ago

Hello,

I have a single cell dataset with immune cells sorted from placentas. Recently I found out that the immune cells could be either fetal or maternal, even at the beggining of the gestation. Since that, I have been trying to use Souporcell to separate these two origins of cells.

I have been using the regular pipeline indicating the bam file, the reference genome, and the number of clusters expected (2). It always gave me about half of fetal and half of maternal cells, which is surprising biologically. I tried souporcell with another dataset of placentas where I know which cells are maternal and which cells are fetal (about 10%) and it still gives me 2 clusters with around 50% of each type.

I think I have been doing something wrong and would like to know how you used souporcell in your publication to separate the two cell populations. Could you please help me and share your computational method for that ?

Best, Kheira

wheaton5 commented 1 month ago

I don't think you are doing anything wrong, but there may be some things that could help. Clearly souporcell is failing to identify the maternal cluster in these samples. For my paper I did exactly what you are doing, but the data was very high quality--mostly in that it was sequenced deeply with something like 25000 umi per cell on average. This is a single to noise thing. More data means more variants sampled per cell upping the signal. So one question would be how many umi per cell do you have in your data? For the sample with 10% chimerism, how do you know that %? I'm not doubting that souporcell is failing to find 2 distinct clusters, but maybe no chimerism exists or it is very small %.

What you can try:

  1. use a common variants file. Unfortunately the links for these files on souporcell's github page are broken. If you email me directly we can find a way to transfer them (email with a gmail address and I can share from my google drive). When using common_variants, you can skip the remap stage with --skip_remap TRUE. This just makes things faster and the remap isn't necessary when only looking at sites that are already known to be common variants.
  2. I have a new tool--cellector--which is designed for finding microchimerism (small numbers of foreign genotype cells. We are in the process of patenting this so it is not currently public, but maybe I can add you to the list of collaborators. It makes the assumption that there are either 1 or 2 genotypes and that one of them is in the majority. This allows me to use anomaly detection which for this problem is much more sensitive.
  3. You could play around with --min_alt and --min_ref setting them higher which gives a smaller but more reliable set of variants on which to cluster.

Let me know how it goes or if I can do anything to help.

Best, Haynes