Closed cfriedrich01 closed 2 years ago
Hi @cfriedrich01 ,
That's a good question and this is definitely a very complicated structure. I think the issue here is that there are some clusters that are truly discrete (ie. they should not be connected via a lineage to the other clusters), but Slingshot isn't picking that up. The yellow cluster at the end of Lineage 3 is a prime example.
You could manually subset the data to omit those clusters, but my recommendation would be to use dist.method = 'mnn'
and omega = TRUE
when you run slingshot
(or getLineages
). This will change the distance metric Slingshot uses to calculate the cluster-based MST and allow for the possibility of some clusters being disconnected. This will hopefully simplify the resulting trajectory, making for fewer overlapping lineages.
And it's a little hard to tell from this image, but issues like this can also stem from a noisy clustering. If one cluster exists in more than one location on the UMAP plot, it can lead to some weird lineages. I don't think that's the issue here, but it's another thing to look out for!
Best, Kelly
Hello Kelly, Thank you very much for your quick response. To be more precise, the cells I observed on this dataset are hematopoietic stem cells. With this type of cells, I would normally observe a differentiation from the most immature cells to the differentiated cells and so, different differentiation trajectories (or lineages). According to your answer, I first checked if some clusters would not be "noisy clusters". To do this, I applied new, stricter filters on my data. With these filters, I obtained a new UMAP. This is much better compared to my previous results because most of the clusters are now well delimited and were much easier to identify. Only cluster 6 is present in two places (encircled in red). Here is my new representation where the CD34+ HSC/LMPP/Multilin is the most immature cell type and will be the starting cluster for the trajectory. Biologically speaking, I am waiting to obtain 6 lineages: CD34+ HSC/LMPP → CD34+ MDP (end point) CD34+ HSC/LMPP → CD34+ Gran → Immature Neutrophils 1 and 2 (end point) CD34+ HSC/LMPP → CD34+ CLP → CD34+ Pre-PC (end point) CD34+ HSC/LMPP → CD34+ CLP → CD34+ pre-B cycling 1 and 2 (end point) CD34+ HSC/LMPP → CD34+ Eo/B/Mast (endpoint) CD34+ HSC/LMPP → CD34+ Early Erythroblast → Erythroblast (endpoint)
I ran Slingshot on these new data and the code I used is the following :
SCE_LINEAGE <- getLineages(data = SCE,
clusterLabels = colData(SCE)[,"activ.ident.cluster"],
reducedDim = "UMAP",
start.clus = "CD34+ HSC/LMPP/Multilin",
dist.method = "slingshot",
use.median = FALSE,
omega = FALSE)
SCE_LINEAGE$slingshot_aftercurve <-getCurves(data = SCE_LINEAGE$slingshot,
shrink = TRUE,
extend = "n",
reweight = TRUE,
reassign = TRUE,
thresh = 0.001,
maxit = 10,
stretch = 0.01,
approx_points = 50,
smoother = "smooth.spline",
shrink.method = "cosine",
allow.breaks = TRUE)
I got the following curves :
It is much better than the previous results in terms of trajectories duplicated. It remains just lineages 7 and 6 which are very similar. How to remove the lineage 7?
Moreover the trajectories do not exactly reflect the biological process of differentiation I expected. How do I deal with this phenomenon ? Do you recommend to “force” the different endpoints with the parameter end.clus
in getLineages ?
Thank you for your tips!
Hi @cfriedrich01,
Yeah, that might be your best option here, since it is such a complicated structure and you do seem to know already where the endpoints should be.
As for removing Lineage 7, I think there are a couple options, neither of which is perfect. Option 1 would be to combine the "CD34+ Early Erythroblast" and "Erythroblast" clusters, which would almost certainly cause more issues because it would be such a noisy, spread out cluster (again, dist.method = "mnn"
might be a way around that, but it still might not be enough, in this case).
Option 2 is basically to ignore it. Given that Lineage 7 is short and pretty much always has other lineages around it, it is probably only being assigned a small number of cells. I have seen papers (and I'm guilty of it myself) where people basically ignore certain clusters which they know to be spurious (ie. doublets, uninteresting cell types, etc.), so I don't think it's unreasonable to treat lineages the same way.
Hope this helps! Kelly
Hi @kstreet13, Thanks for your help and you fast answer. Finally, I think I'll focus my analysis on a single trajectory, so it will be easier to interpret. Best Chloé
Hi Kelly, As you can see in the attached picture, I got 9 lineages in my slingshot results. I'm a little embarrassed by these results because for the lineages numbered 1, 2 and 3 in the picture, they are partially overlapping. I don't understand why lines 1 and 2 are not included in line 3. Is there a way to "merge" the lines that are overlapped?
Thank you for your help !