If start.clus is not defined

jinyoung5484 commented 3 years ago

Just wondering while I was reading the paper. The slingshot paper says that slingshot requires the user to identify the initial cluster or root node. Then, how does slingshot identify root node if start.clus is not defined by the user? If I missed information about this on the paper, sorry.

Thank you!

kstreet13 commented 3 years ago

Hi @jinyoung5484

Good question! Slingshot does have an internal rule-of-thumb for picking a starting cluster, but it is largely untested and I has very little basis in biology, which is why we strongly encourage the user to specify their own starting cluster.

The internal rule attempts to maximize the number of shared clusters (ie. minimize the number of clusters that are specific to a particular lineage). It will check all leaf nodes and sum up the number of lineages passing through each cluster, taking the node that maximizes this value. Just as a concrete example, the Slingshot example data looks like this: So it will pick cluster 1 (the far left) as the root, because this leads to 3 clusters being shared between the lineages, whereas the other leaf nodes would only have 2.

Like I said though, this is just a very basic rule of thumb (and one that would be wrong in many real-world datasets I've seen), so we strongly encourage users to select their own starting cluster.

Best, Kelly

jinyoung5484 commented 3 years ago

Hello @kstreet13 , Thank you for your clear answer above. I reopened this issue for a follow-up question. When the users either use start cluster or not, which biological data does slingshot use to order the cluster to create lineages? I am assuming that slingshot uses gene expression from the given matrix. Then, how does slingshot use gene expression data from the provided matrix to order the clusters in a lineage?

kstreet13 commented 3 years ago

Hi @jinyoung5484,

The ordering is entirely determined by the minimum spanning tree, which is based on the dimensionality reduction and cluster labels. Once the MST is constructed and the starting cluster is chosen (either by the user or by the internal rule), then the number of lineages is determined by the number of unique paths that can be drawn from the starting cluster to a leaf node cluster. In the example from my previous post, when we select a leaf node as the starting point, there are two other leaf nodes and hence 2 lineages. If the user were to specify one of the interior clusters as the starting point, then there would be 3 possible paths to a leaf node and, therefore, 3 lineages.

Hope this helps! Kelly

jinyoung5484 commented 3 years ago

Thank you for the detailed answer. I was little confused on some parts, but that explains all.

kstreet13 / slingshot

If start.clus is not defined #130