kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

unexpected behaviour when specifying the start cluster #227

Closed aaronkwong closed 11 months ago

aaronkwong commented 12 months ago

Hello,

I am experiencing some unexpected behavior when drawing the curves from slingshot. It seems that specifying the label of a particular cluster as the "start.clus" results in a different cluster being used as the start when the curves are drawn.

Given a test dataset of embeddings (emb) and cluster labels (clusters):

head(emb)
                x          y
cell_1 -3.7150283  0.3160177
cell_2  4.7741468  1.1219190
cell_3 -0.7214714 -0.9004010
cell_4  6.8103928  2.4685768
cell_5  5.9526681  0.4710327
cell_6 -4.4462109  1.1229070

head(clusters)
[1] "1" "0" "0" "2" "0" "1"

ggplot(data=emb)+geom_point(aes(x=x,y=y,col=clusters))

image

When running slingshot with the start.clus= "4", the curves drawn appear to start from cluster "0"

lin1 <- getLineages(emb, clusterLabels=clusters,start.clus="4",dist.method="simple")
mst<-slingMST(lin1, as.df = TRUE)
crv1 <- getCurves(lin1)
curves <- slingCurves(crv1, as.df = TRUE)
colnames(curves)[1]<-"x"
colnames(curves)[2]<-"y"

#plot mst
p_mst <- ggplot(emb, aes(x = x, y = y)) +
    geom_point(aes(fill = clusters), shape = 21) + 
    theme_classic()
p_mst<-p_mst + geom_point(data = mst, size = 3) +
    geom_path(data = mst %>% arrange(Order), aes(group = Lineage), size = 1)+ggtitle("MST")
print(p_mst)

#plot the curves
#start cluster does not appear from cluster "4", instead appears to be starting from cluster "0"
p <- ggplot(emb, aes(x = x, y =y)) +
    geom_point(aes(fill = clusters), col = "grey70", shape = 21) + 
    theme_classic()
p<-p + geom_path(data = curves %>% arrange(Order),aes(group = Lineage, col = as.character(Lineage)), size = 1.5) + ggtitle("Curves")
print(p)

image

image

Interestingly, if start.clus is set to "0" instead, it produces curves which appear to start from cluster "4"

ggplot(data=emb)+geom_point(aes(x=x,y=y,col=clusters))
#set start clus to "0" this time
lin1 <- getLineages(emb, clusterLabels=clusters,start.clus="0",dist="simple")
crv1 <- getCurves(lin1)
curves <- slingCurves(crv1, as.df = TRUE)
colnames(curves)[1]<-"x"
colnames(curves)[2]<-"y"
p <- ggplot(emb, aes(x = x, y =y)) +
    geom_point(aes(fill = clusters), col = "grey70", shape = 21) + 
    theme_classic()
#setting start cluster to "0" makes it start from the real cluster "4"
p<-p + geom_path(data = curves %>% arrange(Order),aes(group = Lineage, col = as.character(Lineage)), size = 1.5)
print(p)

image

I have attached the script and toy dataset to reproduce these results. Could this be an issue with the labels getting mixed up in the function drawing the curves? or perhaps I am specifying the start.clus incorrectly?

Thanks, Aaron

start_clus_issue.zip

kstreet13 commented 12 months ago

Thank you for the detailed report! I was able to reproduce your results and I don't think you're doing anything incorrectly. Rather, this is most likely caused by the fact that smoothing splines (and loess) tend to get a bit unstable at the extreme ends of their range. Since the curves are based on smoothing splines, this can sometimes cause issues like the one you're seeing.

I think the best thing to do in these cases is to try adjusting the extend parameter in getCurves. I've found that setting extend = 'n' or extend = 'pc1' can sometimes mitigate these stability issues. (Briefly, this argument controls how Slingshot constructs an initial guess of the pseudotime values based on the MST for cells that lie beyond the center of terminal clusters or before the center of the initial cluster).

Fortunately, both seem to produce reasonable results on your toy dataset (while setting cluster 4 as the starting cluster), though 'pc1' might be slightly better. extend = 'n': image extend = 'pc1': image

aaronkwong commented 11 months ago

Thank you so much for the detailed and quick response, very well explained.

Aaron