kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

end.clus function not working #177

Closed SyrupBadger closed 2 years ago

SyrupBadger commented 2 years ago

Hi, I've just started using the package and it is mostly working great. I am having an issue with the end.clus argument however. When I run the slingshot function on my dataset it places the start cluster as one that I know is terminally differentiated. To get around this I have tried using the end.clus argument stating that this specific clsuter is an endpoint. However, this has no effect on the analysis and it is still being designated as a starting cluster. My data is originally from a Seurat object which I have converted to a SingleCellExperiment. When running slingshot I then tell it what to use as the clusterLabels. Am I missing something here?

#change to a single cell dataset and generate slingshot dataset Tom.integrated.slingshot<-as.SingleCellExperiment(Tom.integrated)

#run slingshot on data Tom.integrated.slingshot<-slingshot(Tom.integrated.slingshot, clusterLabels = Tom.integrated.slingshot$tree.ident, end.clus = c("1", "2"))

Thanks

kstreet13 commented 2 years ago

Hi @SyrupBadger ,

That's a great question! I hadn't actually thought about this before, but what you're describing is definitely possible when end.clus is specified, but start.clus isn't (I generally advise that start.clus is more important, so I haven't seen too many cases like this).

So, under the hood, end.clus actually just forces a cluster to be a leaf node on the minimum spanning tree that connects the clusters. This usually means it will be a terminal node. However, when there is no start.clus provided, Slingshot picks the starting cluster from the set of leaf nodes, which is how you end up with this scenario.

That said, Slingshot's method for selecting a starting point is generally not good. It is based on a notion of parsimony, attempting to maximize the amount of "shared" trajectory before different lineages split off. This is not particularly reflective of the biology in many settings and often leads to ties, which are (or at least used to be) resolved alphabetically by cluster name.

Anyway, to resolve this, I would strongly advise adding adding a start.clus argument. This can be based on prior biological knowledge (which you clearly have some, if you know which clusters represent endpoints) or some additional analysis (eg. RNA velocity or a cell-level stemness score).

Hope this helps! Kelly

SyrupBadger commented 2 years ago

Hi Kelly,

Thanks for the swift reply. While we know the end points of our sample the start point is a bit more unclear. However, I will look into the options you have suggested and play around with what we think is biologically correct.

Thanks again.

Fraser