kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

Predicting cells belonging to two lineages constructed using two subsetted objects #223

Closed Tommy0398 closed 7 months ago

Tommy0398 commented 1 year ago

Hi Kelly,

Is there anyway to "project" two independently created trajectories onto one UMAP to see which cells would be assigned to each lineage? The idea is that we have two conditions in our data and believe that both conditions use certain trajectories but to different degrees.

To get the two independent trajectories an integrated object was subsetted by condition and slingshot run on the subsetted object using the (subsetted) integrated UMAP and clustering information. I've tried mapping the same trajectory's onto the object using the condiments workflow, which works with one of the predicted trajectories for differential expression (condition A ) but not the other. I can't get condition A to naturally give me the trajectory of condition B without using the predict function because the underlying structure is different but that wouldn't answer the question we're investigating anyway. Although it does make me concerned that this approach may not be good.

So, I essentially have the lineage curves for both conditions and want to predict which cells would most likely belong to which lineage in the subsetted objects. Is that possible?

Thanks, Tom

kstreet13 commented 1 year ago

Hi @Tommy0398,

I'm not sure I fully understand, but this sounds like an interesting question!

Since you asked about projecting cells onto an existing trajectory, you're correct that the way to do that is with the predict method provided by slingshot (it is unfortunately hard to find the documentation for this method, but you can get there with ?slingshot::`predict,PseudotimeOrdering-method`). Based on how you initially phrased the question, I would have said to use this twice: project the cells from subset A onto the trajectory from subset B, and vice versa. Why do you say that this wouldn't answer the question you're investigating?

In general, I think it's very hard to compare separate trajectories, so when questions like this come up for me, I always try to fit a single trajectory, even if it's more complicated. A lot of the condiments workflow only works if you have a single, common trajectory and I don't think there's a good way to get there from separate initial trajectories (especially if they have different topologies). Even if certain parts are entirely specific to one condition, it's easier to fit a single trajectory and then test for differences in characteristics such as the distribution of cells along each lineage.

Best, Kelly

Tommy0398 commented 1 year ago

Hi Kelly,

Thank you for the quick response.

I didn't think this would answer the question I'm investigating because, unless I'm misunderstanding the capabilities of the predict function, I don't believe it can map both trajectories onto the same subset at the same time. So, assuming the subset has both trajectories within it, we would want to know which cells are preferentially predicted to be using those trajectories. If we're only looking at one set of the trajectories at a time then its difficult to say if there may be a preference for using a trajectory contained within the other set.

Is this possible with the predict function?

For the condiments workflow I had success getting a common trajectory that represented one subset and got through the whole workflow but not the other. I may be able to achieve that with the other subset's trajectory's by experimenting with the topology structure so they fit the curves for the other subset but that's another question for now.

Thanks, Tom

kstreet13 commented 1 year ago

Ok, I'm still not fully understanding what the question is, so I'll try to clarify a few things. For one, trajectories are learned from data (they're a mathematical construct, like a model) and can then be used to generate predictions for independent data. "Mapping a trajectory onto a subset" is not something that you can do, but it sounds a lot like using the trajectory to make predictions for the subset. Similarly, it does not make sense to say that any dataset has trajectories "within it," because those are models that are constructed from the data. You can use data to fit a model, but you wouldn't say that a dataset "has a logistic regression model within it."

Tommy0398 commented 1 year ago

Sorry for the confusion, I'm not using the jargon very well. The part about using a trajectory to make predictions for the subset is what I want to do and have done, however I want to have trajectories from both subsets at the same time predicted onto one subset. Its a bit of a crude idea to predict whether the cells "prefer" particular lineages when it has all of the paths are options.

I've attempted to draw a diagram to represent what I mean to make it clearer: Mapping

kstreet13 commented 1 year ago

Ok, I think I understand what you're trying to ask, but there's not really any way to answer that question. As with any model, trajectories are learned from data, so if trajectory A is fit on subset A, it will pretty much always be a better fit than trajectory B (basically comparing within-sample performance to out-of-sample performance).