kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

How Slingshot work on 90000 number of nuclei? #228

Closed pariaaliour closed 6 months ago

pariaaliour commented 11 months ago

Dear Slingshot developer, Thanks for this useful package! I was wondering do you recommend applying Slingshot on 90k. nuclei? I know it's not recommended on a large number of cells. But not sure if there is any threshold you are considering for the large number? Should I trust the results then? Many thanks, Paria

kstreet13 commented 11 months ago

Hi @pariaaliour,

I think this should be fine, yes. It might take a few minutes to run, but it shouldn't be an unreasonable amount of time. I haven't heard this recommendation before, so I'm a little curious about that. But in general, thanks to the approx_points argument introduced by princurve and added to slingshot around version 1.5.0, both packages can handle very large numbers of cells. The only potential concern would be the number of lineages, since each cell needs to be projected onto each lineage. But if this is less than 10 (which is already a pretty large number of lineages), I don't think it should be too inconvenient.

Best, Kelly

pariaaliour commented 11 months ago

Thanks for your quick response @kstreet13! I did use approx_points argument this time. However, when I run Slingshot as below my umap plot is cyclic and the orientation of my cell type is not as expected. I was wondering how should I interpret this or if you think I did sth wrong how I can improve. Also, In this publication: [(https://www.biorxiv.org/content/10.1101/2021.12.22.473434v1.full#sec-19] I read that when there is a cyclic shape in clusters Slingshot cannot accurately model it. Could this wrong orientation be because of this issue?

#subset the OPC and OL clusters
OL_clusters <- c("0", "1", "4", "6", "7", "12", "15", "18", "11", "14", "25", "29")
OL_data <- subset(seurat_integrated, seurat_clusters %in% OL_clusters)
OL_data <- RenameIdents(OL_data, "0" = "Olig6", "1" = "Olig3", "4" = "Olig5", "6" = "Olig4", "7" = "Olig2", "11" = "OPC1", "12" = "Olig1", "14" = "OPC2", "15" = "Olig7", "18" = "Olig8", "25" = "OPC3", "29" = "COPC")

#re_run UMAP
OL_data <- RunUMAP(OL_data, reduction = "pca", dims = 1:20, reduction.name = "OL_UMAP", min.dist=0.1, spread = 5, n.neighbors = 100)

#convert the seurat SingleCellExperiment object
sce_OL <- as.SingleCellExperiment(OL_data)

#perform slingshot trajectory analysis
sce_OL <- slingshot(sce_OL, reducedDim = "PCA", extend = "n", 
clusterLabels = colData(sce_OL)$cluster_id, 
start.clus = "OPC1", 
approx_points = 150)

Here is my map plotter running Slingshot (I expected the red cluster, OPC, on the left hand side not right hand side)

Screenshot 2023-07-22 at 8 34 38 PM

Many thanks, Paria

kstreet13 commented 11 months ago

Hi @pariaaliour,

I have a few thoughts as to what might be going on here. For starters, you are correct that Slingshot cannot handle cyclic trajectories (things like cell cycle stage), so if you believe that is the correct model for your data, you might need to use a different tool. Based on the UMAP plot alone, I wouldn't say this looks cyclical, but UMAP plots can be deceiving.

Anyway, the fact that there are four lineages here suggests that there is probably an issue with the underlying MST, which is based on the clustering. Weird, short, and/or overlapping lineages like this are often a result of over-clustering leading to spurious branching events. I would suggest revisiting the clusters that you use as input for Slingshot and perhaps merging some of them or adjusting some parameters.

Best, Kelly

pariaaliour commented 10 months ago

Thanks @kstreet13 for your explanation. I think you are right. So, I did recluster my oligodendrocyte dataset and run the Slingshot and it makes sense now:

Screenshot 2023-08-29 at 6 52 51 PM

I have some questions regarding this plot. First, as this plot shows it is one lineage but how come I have slingPseudotime_1, slingPseudotime_2, and slingPseudotime_3? I think there might be sth I don't know (I should mention it here that the imbalance score plot of this dataset shows more imbalance spot rather than balanced one. could it be because there could not be a global lineage for both condition? but it still does not show in this plot). Second, running Slingshot does not output cellWeights. I need this value to select the number of knots. Third, I was wondering if you could share the code to visualize density estimates for the two groups across pseudo time. Many thanks, Paria

kstreet13 commented 10 months ago

Hi @pariaaliour,

Glad you got it working! Regarding your first question, I'm guessing this is an artifact of re-running slingshot on the same SCE object. If you ran it once and got three lineages, it will produce 3 variables, up to slingPseudotime_3, in the colData. When you run it again, it will overwrite slingPseudotime_1, but it won't delete the other two. These are created for convenience, but the most stable way to access the pseudotime values is with the slingPseudotime() function.

Similarly, for question 2, you can get the cell weights with slingCurveWeights(), though this should just be a vector of 1s, given that you now only have a single lineage.

And there's some code for the density estimate plots in the condiments vignette as well as our BioC 2022 workshop.

Hope this helps! Kelly

pariaaliour commented 10 months ago

Thank you so much Kelly for clarification. You were right and these different slingPseudotime_1:3 was because I applied Slingshot more than one time on sce object. Paria