kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
265 stars 43 forks source link

Overlapping curve issue - change to code needed? #244

Open crist156 opened 7 months ago

crist156 commented 7 months ago

Hi there,

Thank you so much for developing such a wonderful tool for all of us! I was wondering if I might get your opinion on my curve/lineage data to see if I need to tweak some of the initial parameters when running slingshot. I'm trying to run DE analysis on this and have come up against some issues; maybe due to the curves since a couple lineages seem to be so similar/overlapping? This isn't necessarily something that's unexpected given how my cell types differentiate, but I also wouldn't be surprised if its messing up some of the computational side of things.

My input is the integrated data from a SingleCellExperiment (converted from Seurat), where I have 3,000 genes x ~42,000 cells. (I've also tried this using RNA as the starting assay, just to see, but the curves look the same even though the full gene count here is closer to 24k). The data are on the larger side, so I wasn't sure if maybe I should have used approx.points like suggested in the vignette? I wasn't sure what I should set it to in this case, so I left it as the default. I also fit slingshot using both UMAP and PCA, just to see how they differed, and while UMAP looks better, they're still both a bit messy.

I'd love to hear your thoughts on if there is anything I should include in my code and/or if you think these curves are problematic. The lineages for the UMAP itself looks ok (distinct and linear), but I'm thinking that just because it looks nice doesn't mean there's a larger problem. Any input would be greatly appreciated!

Thank you! Sarah

slingshot_curve_comparison

crist156 commented 7 months ago

Hi again,

Sorry to add more to this, but I shrunk the size of the nodes down and realized that there's something weird going on in the center, where there's actually a fully distinct trajectory from the light purple to the dark purple on the right. I originally thought it branched off the main one, but this shows that it doesn't. Weirdly, though, when I looked at how the lineages go through the clusters, it does have this one (lineage 3) starting with all the rest. Why would the lineage be plotted as separate but in actuality it's really just a bifurcartion?

Smaller points show two distinct paths -

Screenshot 2024-02-14 at 9 27 00 AM

Lineage path with Seurat cluster labels -

Screenshot 2024-02-14 at 10 00 45 AM

Based on the slingshot output, it would seem that there should be a clear bifurcation of lineages 1, 2 and 3 at Cluster 3. Not sure what to make of how the plot isn't exactly matching this.

Would love to hear your thoughts!

kstreet13 commented 7 months ago

Hi @crist156,

I think I can clear a few things up for you, but this doesn't seem to be a problem with the slingshot package itself.

First, I think you should spend some more time with the upstream analysis steps, particularly normalization and dimensionality reduction. It's rare to see a UMAP plot with so little structure and I can see two possible explanations for that: (1) aggressive normalization removed biological signal from the data, or (2) these are actually highly homogeneous cells. Either way, there is no clear trajectory structure in the data as it stands, so it's not too surprising that the slingshot results end up looking a bit nonsensical.

You also mentioned using the full matrix of gene counts as opposed to the reduced matrix of 3,000 genes. This doesn't make a difference because slingshot only uses the dimensionality reduction (PCA or UMAP), which I'm guessing was done in Seurat on the 3,000 most variable genes.

The default value of approx_points is 150 and I think it's fine to leave it at that. Ideally, changing this value won't make a huge impact on the resulting lineages, it will just change the amount of computing time required.

And as you noted, there is something weird going on, but it's not that clusters 13 and 8 are disconnected. There is still a black line connecting cluster 13 to cluster 3, so that makes sense. However, the actual center of cluster 3 is not where you might expect based on the concentration of orange-colored cells. It looks like there are a lot of cells from cluster 3 hidden behind some of the other clusters, as evidenced by its center being down by cluster 13. Cluster 12 has a similar problem, where its center is not where you would expect, but actually much closer to cluster 10. These sorts of weird, split clusters are often a result of Seurat's graph-based clustering methods, such as Louvain clustering.

Hope this helps! Kelly

crist156 commented 7 months ago

Hi @kstreet13 ,

Thanks so much for thinking about my problem! I appreciate some of the clarification you've given and agree with your suggestion to take a closer look upstream. I inherited the data from someone else and took for granted that I agreed with the integration and clustering steps. I'll revisit this and keep my fingers crossed!

Thanks for your help! Sarah