kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Trajectories behaving strangely using two different approaches #220

Closed Tommy0398 closed 1 year ago

Tommy0398 commented 1 year ago

Hi Slingshot-team,

I was hoping you could help me understand a problem I'm having with creating trajectories using a singlecellexperiment object. I was adapting the method used in https://hectorrdb.github.io/condimentsPaper//articles/TGFB.html as its a bit easier to use than the official method (https://bioconductor.org/packages/devel/bioc/vignettes/slingshot/inst/doc/vignette.html#using-slingshot)

When I convert the trajectories from Seurat objects to SingleCellExpreiment (sce) objects and run the pipeline, the trajectories double back onto a single point, whereas if I just run slingshot using the dimensional-reduction and clustering information they don't do this. I'd prefer to use the sce object as alot of the tutorials use this and it'd make it alot easier to do plots but the trajectories don't look right in the SCE object, and I'm not sure if I'm doing this correctly.

Method 1 - Just using UMAP and clustering info

#1) Load in Integrated Seurat object (SCT V2 method)
Seu_combined_sct <- readRDS( file = paste0("output/", "/Seurat_objects/Seurat_object_integrated_no_cc.RDS"))
#2) Split by condition 
    #Only doing this because it resulted in trajectories we wanted to investigate 
Optimal <- subset(x = Seu_combined_sct, subset = condition == "Optimal")
seu_trajectory <- Optimal
#3)Extract Dimentional reduction/clustering information from the object 
dimred <- seu_trajectory@reductions$umap@cell.embeddings
clustering <- seu_trajectory@meta.data$seurat_clusters
#4) Run Linages and curves functions 
#Lineages
lineages <- getLineages(dimred, clustering)
plot.new()
plot(dimred, col = pals::cols25(16)[clustering], asp = 1, pch = 16,cex = 0.5)
lines(SlingshotDataSet(lineages), lwd = 3, col = 'black')
#Curves
curves <- getCurves(lineages)
plot.new()
plot(dimred, col = pals::cols25(16)[clustering], asp = 1, pch = 16,cex = 0.5)
lines(SlingshotDataSet(curves), lwd = 3, col = 'black')
#5)Select variable features from the Seurat object 
temp <- SCTransform(Seu_combined_sct,do.scale=T)
temp <- FindVariableFeatures(temp,nfeatures=10000)
gene_list <- VariableFeatures(temp)
#6) Set up paralalisation 
BPPARAM <- BiocParallel::bpparam()
BPPARAM 
BPPARAM$workers <- 8 # use 2 cores
#7) Run fitGAM 
counts <- seu_trajectory@assays$RNA@counts
sce <- fitGAM(
  counts = as.matrix(counts),
  sds = curves,
  nknots = 5,
  genes = gene_list,
  BPPARAM = BPPARAM, parallel = TRUE,
)

OR Method 2 - Converting the Seurat object to single cell object and then using that

#1) Load in Integrated Seurat object (SCT V2 method)
Seu_combined_sct <- readRDS( file = paste0("output/", "/Seurat_objects/Seurat_object_integrated_no_cc.RDS"))
#2) Split by condition 
    #Only doing this because it resulted in trajectories we wanted to investigate 
Optimal <- subset(x = Seu_combined_sct, subset = condition == "Optimal")
Seu_combined_sct <- Optimal
#3) Convert to singlecellobject and add UMAP info + dataframe for plots
sce <- as.SingleCellExperiment(Seu_combined_sct, assay = "RNA")
reducedDim(sce,"UMAP") <- Seu_combined_sct@reductions$umap@cell.embeddings
to_select <- colnames(a)[!colnames(a) %in% "slingshot"]
df <- as.data.frame(colData(sce)[,to_select])
df$cells <- row.names(df)
Umaps <- Seu_combined_sct@reductions$umap@cell.embeddings
Umaps <- as.data.frame(Umaps)
Umaps$cells <- row.names(Umaps)
df <- full_join(df,Umaps,by = "cells")
#4) Slingshot function 
sce <- slingshot(sce, reducedDim = 'UMAP', clusterLabels = sce$seurat_clusters)
mst <- slingMST(sce, as.df = TRUE)
ggplot(df, aes(x = UMAP_1, y = UMAP_2, col = seurat_clusters)) +
  geom_point(size = .7, alpha = .3) +
  labs(col = "Cluster") +
  geom_path(data = mst, col = "black", size = 1.5) +
  geom_point(data = mst, aes(col = Cluster), size = 5)
#5) Look at curves 
pseudotime <- slingPseudotime(sce)
pseudotime <- as.data.frame(pseudotime)
pseudotime$cells <- row.names(pseudotime) 
df <- full_join(df,pseudotime)
# Plot the MST
# df
# as.data.frame(df)
curves <- slingCurves(sce, as.df = TRUE)
 ggplot(df, aes(x = UMAP_1, y = UMAP_2, col = seurat_clusters)) +
    geom_point(size = .7) +
    # scale_color_viridis_c() +
    labs(col = "Pseudotime") +
    geom_path(data = curves, col = "black", size = 1.5)

They all pull into the bottom left point and I'm not sure why

So, can I avoid this happening in the sce object? And is subsetting the seurat object even a valid thing to do. I wanted to explore the results and try other methods but the time it takes to fitGAM is quite restrictive. I do also still observe the points pulling back on themselves in the non-subsetted object, although it is a bit less extreme.

kstreet13 commented 1 year ago

Hi @Tommy0398,

Between the code and the plots there's kind of a lot going on here, so I might not be able to clear everything up. But it sounds like what you want to do should be entirely possible.

First, there's a lot of code here that doesn't seem necessary. For instance, in both cases you create a subsetted Seurat object called Optimal, but then never use it. I don't think you need this, since it is quite easy to subset SingleCellExperiment objects. Similarly, you don't need to manually add the UMAP coordinates, as these are carried over by the as.SingleCellExperiment conversion function (you can check this with reducedDimNames(sce)). In the first case, you also use an object called seu_trajectory that is not referenced anywhere else.

Here's the basic pipeline I think you should be using:

I should also mention that your plots show some of the classic pathologies of Louvain clustering that can cause problems for trajectory inference. Namely, there are some clusters that are clearly represented in different places around the plot, making for a cluster center that doesn't make sense (the light blue cluster in the bottom middle is an example of this).

I would also recommend waiting to run fitGAM until you have a trajectory you are happy with and ready to move on to temporal expression analysis. Because you're right, it does take a long to run and unfortunately, I don't have any helpful tricks to make it faster.

I hope this helps! Kelly

Tommy0398 commented 1 year ago

Hi Kelly,

Thank you for the fast reply, and sorry for the confusion, I missed out a line in both the code blocks where I renamed the Optimal object back to a standard name for simplicity to the code I already had. Seu_combined_sct <- Optimal and seu_trajectory <- Optimal. Sorry for asking a question with incomplete code, I've edited the original post to correct that, but the overall method is the same.

Thank you for the information on the singlecellexperiment object, I'm not overly familiar with it yet so its very helpful.

  1. Regarding the clustering issues, I had observed some of the clusters being in multiple places across the trajectory, which I assume is what may cause the trajectory lines to move in unexpected directions. Would you suggest that I use an alternative clustering method for this data? I've tried all the ones supported natively be Seurat, do you have an alternative method you prefer?

  2. I'm still not sure why using a singlecellexperiment object gives different results to just using the dimentional reduction and clustering information, I would have thought the trajectories in the plots would be identical. If I follow the pipeline after converting to a singlecellobject the trajectories are confusing (the second set of plots).

Thanks, Tommy

kstreet13 commented 1 year ago

Hi @Tommy0398,

  1. Unfortunately, I don't have a go-to clustering method that works "best" with Slingshot. In your case, it may be difficult to find one that performs well, as there is not much clear structure in the data, judging by the UMAP plot.
  2. You're right that these two methods should return identical results. I would suggest simplifying your code as much as possible to try to identify the problem.

Best, Kelly