kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
258 stars 42 forks source link

What does a curve represent biologically? #243

Closed jjia1 closed 1 month ago

jjia1 commented 4 months ago

Hi Kelly,

Thanks so much for developing this tool. I wanted to ask you a question about how I can approach my analysis. I have a dataset containing 2 different time points and 2 different treatment conditions. My goal for the project was to answer whether the treatment has any effect on a specific cell type, and how it affects the development of these cells along the two different time points. To answer my question, I was considering a few different options about how to approach the problem but I wanted to get your input about the approaches I was considering:

1) If I use slingshot on the integrated object and do any analysis downstream, e.g. tradeseq with conditions: AllPrincipalCurves

vs. 2) if i separate my object by condition and timepoint, apply slingshot and any downstream analysis on each individual object: AllPrincipalCurves

I feel like both approaches answer different questions, but is there a "proper" or "correct" protocol to follow? On that note, I also wanted to ask if it would be make more sense to group my data by Condition or Treatment and draw curves using that kind of figure instead. Is there a difference in the conclusions I can make by drawing those kinds of curves?

I also wanted to ask an additional question Is there a way to remove lineages from a slingshot object? This may be a tradeSeq issue, but setting some lineages to NULL in my slingshot object was causing errors when running the tradeSeq analysis.

sce <- readRDS("mysinglecellexperimentobject.rds") sds <- SlingshotDataSet(sce) sds@lineages$Lineage5 <- NULL sds@curves$Lineage5 <- NULL test <- fitGAM(counts = counts, sds = sds, knots = 6, genes = df, conditions = conds) returns Fitting lineages with multiple conditions. This method has been tested on a couple of datasets, but is still in an experimental phase. Error in .assignCells(cellWeights) : Some cells have no positive cell weights. Sorry for the bombardment of questions! I really appreciate your time and effort in developing the tool and answering my questions.

kstreet13 commented 4 months ago

Hi @jjia1,

Thanks for the questions. For a more in-depth discussion of our thoughts on trajectory analysis with multiple conditions, check out the condiments paper.

In general, I would recommend fitting a single trajectory to all the data, then using that shared structure as a way to examine differences between the conditions. Most of our condiments workflow operates under this assumption.

That said, your first plot definitely seems to have too many lineages, which is why it ends up looking so weird. I think you would have better results with fewer clusters. If you used the colors from that plot as the input clusters for slingshot (possibly splitting "EN" and "EN2") rather than the fine-grain cluster labels, it looks like you shouldn't get more than 4 lineages.

This should also solve your downstream issues with having too many lineages in tradeSeq, but in general, you should always give the full output of slingshot to tradeSeq. If you're only interested in certain lineages, you can simply choose to focus on those results afterward.

Hope this helps! Kelly

jjia1 commented 4 months ago

Thanks for the quick response Kelly!

You're right, there are too many clusters. I did try to split EN because they looked like distinct groups of EN clusters rather than one unified cell type. I do think, however, I probably split the other clusters too much.

I had another question about the vignette from your Bioc2020 workshop where you generated these figures. image image

You mentioned that the imbalance score you calculated will determine whether the treatment affects the trajectory or not. Although your results indicate the imbalance does not affect the global trajectory, what would the heatmap look like if it did? And if the treatment does impact my data (I assume this is applicable for timepoint as well), would you recommend using subsets of the data where conditions are separated?

I was wondering if i can use slingshot either on a subset of my data with only a singular cell type, possibly drawing a lineage between different conditions and timepoints for one celltype. image

kstreet13 commented 4 months ago

Although your results indicate the imbalance does not affect the global trajectory, what would the heatmap look like if it did?

Honestly, about the same, because the colors scale to the minimum and maximum values.

And if the treatment does impact my data (I assume this is applicable for timepoint as well), would you recommend using subsets of the data where conditions are separated?

No, because there's no good way to compare trajectories that were constructed on disjoint subsets of the data. Every downstream test assumes that you have all cells from all conditions together on a single unified trajectory, so I personally don't see a lot of utility in the imbalance scores/test.

I was wondering if i can use slingshot either on a subset of my data with only a singular cell type, possibly drawing a lineage between different conditions and timepoints for one celltype.

I guess it depends on the question you're trying to answer. This would certainly be a use case we hadn't considered, so I can't offer much advice on how to approach it. In general, I would start with the simpler analysis (on all the data) first and proceed from there.

jjia1 commented 4 months ago

Thanks so much Kelly!

jjia1 commented 1 month ago

Hi Kelly,

Sorry to bother again. I wanted to ask about my curve weights matrix.

print(head(slingCurveWeights(sce$slingshot)))
                              Lineage1  Lineage2  Lineage3 Lineage4  Lineage5
NA18-Con-D1_AAACCCACACACTGGC 1.0000000 1.0000000 0.0000000        0 0.0000000
NA18-Con-D1_AAACCCACACTGGAAG 0.9549824 0.9549663 0.9547237        1 0.9552245
NA18-Con-D1_AAACCCAGTGGCTCTG 1.0000000 0.0000000 0.0000000        0 0.0000000
NA18-Con-D1_AAACCCAGTGGTTTGT 1.0000000 1.0000000 1.0000000        0 1.0000000
NA18-Con-D1_AAACCCAGTTCTCGTC 1.0000000 0.0000000 0.0000000        0 0.0000000
NA18-Con-D1_AAACCCATCTAGGCCG 0.0000000 0.0000000 0.0000000        1 0.0000000

Is it possible for a single cell to belong to multiple lineages? I don't see any problems with it, but I just wanted to double check whether this would be problematic (and if it's technical in nature). I'm working with the same developing brain data, but I've adjusted some of the PCA/UMAP and slingshot parameters to get better looking curves.

kstreet13 commented 1 month ago

Hi @jjia1,

Yes, that's entirely possible. A cell will have non-zero weights along multiple lineages if it falls at a point in the trajectory where those lineages have not diverged yet. For example, it looks like your first cell is somewhere along the trajectory where Lineages 1+2 have split from Lineages 3+4+5, but 1 and 2 haven't split yet.

Best, Kelly

jjia1 commented 1 month ago

Awesome. Thanks so much Kelly!