kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Lineages are partially overlapping #176

Closed cfriedrich01 closed 2 years ago

cfriedrich01 commented 2 years ago

Hi Kelly, As you can see in the attached picture, I got 9 lineages in my slingshot results. I'm a little embarrassed by these results because for the lineages numbered 1, 2 and 3 in the picture, they are partially overlapping. I don't understand why lines 1 and 2 are not included in line 3. Is there a way to "merge" the lines that are overlapped?

Présentation1

Thank you for your help !

kstreet13 commented 2 years ago

Hi @cfriedrich01 ,

That's a good question and this is definitely a very complicated structure. I think the issue here is that there are some clusters that are truly discrete (ie. they should not be connected via a lineage to the other clusters), but Slingshot isn't picking that up. The yellow cluster at the end of Lineage 3 is a prime example.

You could manually subset the data to omit those clusters, but my recommendation would be to use dist.method = 'mnn' and omega = TRUE when you run slingshot (or getLineages). This will change the distance metric Slingshot uses to calculate the cluster-based MST and allow for the possibility of some clusters being disconnected. This will hopefully simplify the resulting trajectory, making for fewer overlapping lineages.

And it's a little hard to tell from this image, but issues like this can also stem from a noisy clustering. If one cluster exists in more than one location on the UMAP plot, it can lead to some weird lineages. I don't think that's the issue here, but it's another thing to look out for!

Best, Kelly

cfriedrich01 commented 2 years ago

Hello Kelly, Thank you very much for your quick response. To be more precise, the cells I observed on this dataset are hematopoietic stem cells. With this type of cells, I would normally observe a differentiation from the most immature cells to the differentiated cells and so, different differentiation trajectories (or lineages). According to your answer, I first checked if some clusters would not be "noisy clusters". To do this, I applied new, stricter filters on my data. With these filters, I obtained a new UMAP. This is much better compared to my previous results because most of the clusters are now well delimited and were much easier to identify. Only cluster 6 is present in two places (encircled in red). Here is my new representation where the CD34+ HSC/LMPP/Multilin is the most immature cell type and will be the starting cluster for the trajectory. Biologically speaking, I am waiting to obtain 6 lineages: CD34+ HSC/LMPP → CD34+ MDP (end point) CD34+ HSC/LMPP → CD34+ Gran → Immature Neutrophils 1 and 2 (end point) CD34+ HSC/LMPP → CD34+ CLP → CD34+ Pre-PC (end point) CD34+ HSC/LMPP → CD34+ CLP → CD34+ pre-B cycling 1 and 2 (end point) CD34+ HSC/LMPP → CD34+ Eo/B/Mast (endpoint) CD34+ HSC/LMPP → CD34+ Early Erythroblast → Erythroblast (endpoint)

fig1

I ran Slingshot on these new data and the code I used is the following :

SCE_LINEAGE <- getLineages(data = SCE,      
  clusterLabels = colData(SCE)[,"activ.ident.cluster"],
  reducedDim = "UMAP", 
  start.clus = "CD34+ HSC/LMPP/Multilin", 
  dist.method = "slingshot",
  use.median = FALSE,
  omega = FALSE)

SCE_LINEAGE$slingshot_aftercurve <-getCurves(data = SCE_LINEAGE$slingshot, 
                                             shrink = TRUE, 
                                             extend = "n", 
                                            reweight = TRUE,
                                             reassign = TRUE, 
                                             thresh = 0.001,
                                             maxit = 10,
                                             stretch = 0.01,
                                             approx_points = 50,
                                             smoother = "smooth.spline",
                                             shrink.method = "cosine",
                                             allow.breaks = TRUE)

I got the following curves :

fig2

It is much better than the previous results in terms of trajectories duplicated. It remains just lineages 7 and 6 which are very similar. How to remove the lineage 7? Moreover the trajectories do not exactly reflect the biological process of differentiation I expected. How do I deal with this phenomenon ? Do you recommend to “force” the different endpoints with the parameter end.clus in getLineages ?

Thank you for your tips!

kstreet13 commented 2 years ago

Hi @cfriedrich01,

Yeah, that might be your best option here, since it is such a complicated structure and you do seem to know already where the endpoints should be.

As for removing Lineage 7, I think there are a couple options, neither of which is perfect. Option 1 would be to combine the "CD34+ Early Erythroblast" and "Erythroblast" clusters, which would almost certainly cause more issues because it would be such a noisy, spread out cluster (again, dist.method = "mnn" might be a way around that, but it still might not be enough, in this case).

Option 2 is basically to ignore it. Given that Lineage 7 is short and pretty much always has other lineages around it, it is probably only being assigned a small number of cells. I have seen papers (and I'm guilty of it myself) where people basically ignore certain clusters which they know to be spurious (ie. doublets, uninteresting cell types, etc.), so I don't think it's unreasonable to treat lineages the same way.

Hope this helps! Kelly

cfriedrich01 commented 2 years ago

Hi @kstreet13, Thanks for your help and you fast answer. Finally, I think I'll focus my analysis on a single trajectory, so it will be easier to interpret. Best Chloé